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ABSTRACT  (ONGERUBRICEERD) 


This  report  presents  the  results  of  the  feasibility  study  investigating  the  characteristics  of  complex 
Zemike  moments  and  their  application  in  translation-,  scale-  and  rotation-invariant  object 
recognition  problems.  The  complex  Zemike  moments  are  used  as  characterising  features  in  a 
neural  network  based  target  recognition  approach  for  the  classification  of  objects  in  images 
recorded  by  sensors  mounted  on  an  airborne  platform.  The  complex  Zemike  moments  are  a 
transformation  of  the  image  by  the  projection  of  the  image  onto  an  extended  set  of  orthogonal 
polynomials. 

The  emphasis  of  this  study  is  laid  on  the  evaluation  of  the  performances  of  Zemike  moments  in 
relation  with  the  application  of  neural  networks.  Therefore,  three  types  of  classifiers  are 
evaluated:  a  multi-layer  perceptron  (MLP)  neural  network,  a  Bayes  statistical  classifier  and  a 
nearest-neighbour  classifier.  Experiments  are  based  on  a  set  of  binary  images  simulating  military 
vehicles  extracted  from  the  natural  background.  From  these  experiments  the  conclusion  can  be 
drawn  that  complex  Zemike  moments  are  efficient  and  effective  object  characterising  features 
that  are  robust  under  rotation  of  the  object  in  the  image  and  to  a  certain  extent  under  varying 
affine  projections  of  the  object  onto  the  image  plane. 
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SAMENV ATTING  (ONGERUBRICEERD) 

Dit  rapport  beschrijft  de  resultaten  van  de  haalbaarheidstudie  Neurale  Netwerken  voor  een  RPV 
Monitor.  Binnen  dit  onderzoek  is  gekeken  naar  de  eigenschappen  van  complexe  Zemike  momenten 
en  toepassingsmogelijkheden  binnen  de  context  van  translatie-,  schaal-  en  rot  atie-in  van  ante 
objektherkenning.  De  complexe  Zemike  momenten  dienen  als  karakteristieke  kenmerken  voor  het 
herkennen  van  objekten  in  beelden  opgenomen  door  middel  van  sensoren  die  geplaatst  zijn  op  een 
vliegend  platform.  De  eigenlijke  Zemike  momenten  zijn  een  transformatie  van  het  beeld  door  middel 
van  een  projectie  op  een  uitgebreide  set  orthogonale  polynomen. 

De  nadruk  van  het  onderzoek  ligt  op  de  evaluatie  van  Zemike  momenten  in  relatie  met  neurale 
netwerken  als  klassificatiemechanisme.  Met  dit  doel  zijn  drie  typen  klassificatoren  uitgetest,  te 
weten  een  multi-layer  perception  neuraal  netwerk,  een  nearest-neighbour  klassificator  en  eat  Bayes 
schatter.  Bij  het  uitvoeren  van  de  experimenten  is  gebruik  gemaakt  van  een  database  van  beelden 
met  contouren  van  militaire  voertuigea  Uit  de  experimenten  blijkt  dat  de  transformatie  van  een 
beeld  in  complexe  Zemike  momenten  een  effidente  en  effectieve  wijze  is  om  objekt-contour 
kenmerken  te  karakteriseren  zowel  onder  rotatie  van  het  objekt  in  het  beeldvlak  als  in  zekere  mate 
bij  verandering  van  de  projectierichting  van  het  objekt  in  het  beeld. 
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1 .  AUTOMATIC  TARGET  RECOGNITION 

1.1  Introduction 

Surveillance  and  reconnaissance  operations  are  of  great  importance  in  any  imaginable  conflict 
situation  but  also  in  complex  and  politically  sensitive  circumstances  as  arms  reduction 
verification  operations  and  humanitarian  operations.  Within  this  context,  reconnaissance 
platforms  and  satellites  are  widely  used  for  surveillance  tasks  on  a  global  scale.  On  a  more  local 
scale,  the  application  of  Remotely  Piloted  Vehicles  emerge  as  an  efficient  and  effective 
alternative.  The  flexibility  in  its  use  and  the  cost  effectiveness  have  resulted  that  in  various 
countries  RPV  systems  have  been  incorporated  into  the  Recce  patrols  [Hooton  and  Munson.  ’92). 
These  platforms  constantly  monitor  large  areas  of  the  world.  A  never  lasting  stream  of 
information,  embedded  in  recorded  images,  must  be  extracted,  analysed  and  interpreted  by  human 
operators.  Therefore,  many  of  today's  military  platforms  would  benefit  greatly  from  an  automatic 
object  recognition  capability  in  a  system  that  is  sophisticated  enough  that  it  can  substitute  the 
man  in  the  loop.  i.e.  the  human  operator  who  is  still  necessary  or  mandatory  for  the  final 
identification,  verification,  or  last  check  in  today’s  semi-autonomous  recognition  systems. 

Rapid  developments  in  sensor  technology  and  signal  processing  algorithms  and  hardware  have 
made  it  possible  to  equip  platforms,  and  especially  RPV  systems,  with  advanced  sensor  systems 
consisting  of  combinations  of  visible  light  and  infrared  camera's  or  FLIR  and  CCMaser  radar. 
The  images  or  video  generated  by  the  sensor  systems  are  broadcasted  to  a  control  centre  on  the 
ground  or  returned  back  by  the  platform  recorded  on  tape.  This  information  must  be  processed 
and  analysed  by  human  operators.  The  large  amount  of  information  that  becomes  available  and 
the  possibly  stressing  conditions  in  which  the  analysis  and  identification  work  has  to  be 
performed,  justify  any  attempt  to  automate  this  process. 

In  this  document  we  present  the  evaluation  results  of  the  performance  of  Zemike  moments,  a 
particular  algorithm  that  may  serve  as  one  of  many  functional  blocks  in  an  experimental,  yet  to  be 
developed  automatic  target  recognition  system.  As  will  be  explained  later,  a  target  recognition 
system  exits  of  several  different  stages.  The  algorithm  described  in  this  document  is  a  solution  to 
the  problem  of  translation,  scale  and  rotational  variances  of  the  appearance  of  objects  in  an  image 
[Dudani  et  al..  '77j.  These  variances  are  a  natural  effect  in  the  images  since  a  sensor  platform  may 
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fly  on  different  altitudes  or  make  use  of  different  focal  lengths  of  the  camera  resulting  in  varying 
object  dimensions.  Furthermore,  a  platform  may  approach  objects  from  different  angles  resulting 
in  varying  rotational  positions  and  affine  projections  of  objects  onto  the  image  plane. 

1 .2  Automatic  Target  Recognition  Scheme 

The  aim  of  a  vision-based  automatic  target  recognition  system  is  the  classification  or 
identification  of  located  objects  in  an  image  or  sequence  of  images.  In  this  context,  classification 
can  be  considered  as  assigning  a  class  label  to  a  located  object.  Class  labels  v>mtify  and 
discriminate  different  classes  according  to  a  given  taxonomy.  The  definition  of  a  clas.  or  subclass 
is  based  on  the  selection  of  a  set  of  features  or  characteristics  that  are  unique  and  consistently 
present  in  all  members  of  the  class. 

In  the  same  way.  recognition  can  be  regarded  as  the  process  of  the  identification  of  a  unique 
member  (instance)  of  a  given  class.  Recognition  is  based  on  the  identification  of  unique 
characteristics  of  one  specific  object  and  lies  beyond  the  scope  of  the  project  described  in  this 
document. 

The  target  recognition  process  may  be  subdivided  into  five  stages:  preprocessing,  object  location, 
object  segmentation,  feature  extraction  and  object  classification.  The  different  stages  in  the 
processing  scheme  and  the  relations  between  the  different  stages  is  depicted  in  Figure  1.1. 

The  aim  of  the  image  preprocessing  stage  is  to  optimise  the  starting  conditions  of  the 
classification  trajectory.  Knowledge  about  sensor  characteristics  or  signal  propagation,  for 
example,  may  be  used  to  improve  the  contrast  between  the  object  and  the  background.  One 
frequently  used  preprocessing  filter  in  image  processing  is  the  median  filter  and  the  Gaussian 
filter.  The  median  filter  substitutes  each  pixel  by  the  median  from  a  small  neighbourhood.  The 
Gaussian  filter  results  in  a  weighted  averaging  where  the  weighting  coefficients  are  effectively  a 
2-dimensional  Gaussian.  Besides  local  image  operators,  there  are  global  operators  influencing  all 
image  points.  An  example  of  a  global  operator  is  contrast  enhancement  by  histogram  averaging. 
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Figure  1.1  Schematic  overview  of  the  different  subsystems  of  an  automatic  target  recognition 
system. 

The  second  stage  in  the  classification  trajectory  is  the  location  of  potentially  interesting  objects  in 
the  image.  Texture  information  or  brightness  information  are  frequently  used  to  select  interesting 
areas  that  have  to  be  further  processed.  Solving  the  object  location  problem  is  far  from  trivial. 
Moving  objects  may  be  detected  by  evaluating  the  optical  flow  of  pixels  in  a  sequence  of  images. 
Pixels  representing  the  object  can  be  distinguished  from  background  pixels  because  object  pixels 
have  different  optical  flow  vectors.  An  other  approach  may  be  based  on  combining  multi-spectral 
images,  e.g.  visible  light  and  infrared  images.  Hot-spots  (large  concentration  of  thermal  radiation) 
in  infrared  images  may  give  clues  for  the  position  of  objects  in  the  visible  light  images. 

After  locating  the  object  position  in  the  image,  the  object  has  to  be  separated  from  its  background 
as  accurate  as  possible.  Again,  brightness  information  can  be  used  in  combination  with  contour 
information.  Most  artificial  or  man-made  objects  have  convex  borders  that  can  be  separated  from 
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concave  borders.  Edges  may  be  found  from  a  grey  scale  image  by  filtering  the  image  with  an 
edge  detector  and  thresholding  the  result.  The  accuracy  of  this  process  influences  to  a  large 
extend  the  reliability  of  the  further  processing. 

The  next  step  in  the  classification  trajectory  is  the  calculation  of  geometric  and  spectral  features 
of  each  segmented  region.  Geometric  features  include  the  average  intensity  value  of  the  segment, 
the  occupied  area  and  the  ratio  of  its  length  to  its  width.  This  information  can  be  captured  in  the 
segment  moments.  Also  the  curvedness  of  the  segment  border  can  be  considered.  This 
information  can  be  extracted  from  so-called  Fourier  descriptors.  Spectral  information  can  give  a 
measure  of  the  texture  of  the  region.  In  general,  we  can  say  that  we  are  looking  for  those  object 
features  that  are  invariant  for  translation,  scaling  and  rotation  of  the  object  in  the  image  since 
those  parameters  are  very  difficult  to  control  in  our  application.  For  an  extended  survey  about 
invariant  features  or  representations  of  image  objects,  see  for  example  [Toet.  ’92]. 

Finally,  the  object  classification  is  based  on  the  statistical  evaluation  of  the  features  attributes 
values  and  probabilities.  Nearest-neighbour  and  minimum-mean-distance  classifiers  or  neural 
networks  can  be  used  to  subdivide  the  multi-dimensional  feature  space  in  subspaces  by  decision 
boundaries  [Duda  and  Han,  '73][Schalkoff,  ’92],  Each  subspace  or  set  of  subspaces  represents  a 
class.  If  a  feature  vector  is  lying  in  one  of  the  subspaces,  the  related  class  label  is  assigned  to  the 
object.  On  the  other  hand,  if  different  sources  of  evidence  for  a  particular  object  are  available,  e.g. 
in  a  multiple-sensor  system,  other  techniques  like  the  Dempster-Schafer  theory  [Shafer,  76]  may 
be  used  in  combination  with  a  knowledge-based  system  to  perform  the  classification  task. 

1.3  Invariances 

As  stated  above,  the  problem  of  invariances  is  our  main  point  of  interest.  In  literature,  two 
alternative  approaches  towards  the  problem  of  invariances  can  be  found  that  have  shown  to  be 
reasonably  successful:  the  application  of  moments  making  use  of  moment  invariants  (Hu,  ’62] 
and  boundary  encoding  making  use  of  Fourier  boundary  descriptors  [Persoon  and  Fu,  77].  The 
latter  is  also  known  to  be  invariant  under  affine  transformations  [Arbter,  '89)[Arbter,  Snyder, 
Burkhardt  and  Hirzinger,  ’90].  Both  techniques  suffer  from  the  disadvantage  of  being 
computationally  intensive.  Recently,  a  fast  alternative  has  been  developed  and  published  by 
Schau  [Schau,  ’92].  It  is  based  on  analysing  basic  2  by  2  elements  of  a  binary  image.  The  number 
of  basic  elements  is  used  to  define  a  distance  measure  between  the  unknown  object  and  a  set  of 
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known  prototypes.  This  technique  is  interesting  because  of  its  simplicity  and  its  potential  for  fast 
implementation. 

In  spite  of  its  complexity,  we  have  chosen  to  investigate  the  invariance  properties  of  the  complex 
Zemike  moments  because  of  their  promising  results  presented  in  the  literature.  To  obtain  the 
Zemike  moments  we  need  a  radial  symmetric  transformation  which  may  be  accomplished  by  a 
combination  of  a  circular  Fourier  transform  and  a  radial  Mellin  transform  [Grace  and  Spann, 
’91](Sheng  and  Arsenault,  '86][Sheng  and  Duvemoy,  ’86].  In  our  experiments,  however,  we  made 
use  of  predefined  templates  which  reduced  the  computations  to  simple  image  multiplication.  The 
complexity  of  the  algorithm  can  be  further  reduced  by  performing  a  polar  transform  on  the  image 
followed  by  simple  vector  operations  to  obtain  the  desired  moments.  This  will  be  discussed  in 
one  of  the  following  sections. 

1 .4  Problem  Definition 

The  goal  of  the  project  Haalbaarheidstiidie  Neurale  Netwerken  is  to  demonstrate  that  artificial 
neural  networks  are  a  powerful  paradigm  and  provide  useful  building  blocks  to  construct  a  robust 
and  reliable  target  recognition  system. 

The  target  recognition  processing  scheme,  as  presented  above,  exists  of  several  modules.  In 
theory,  neural  networks  can  be  used  to  implement  operations  for  each  individual  module  [Roth, 
'90].  Though  all  modules  are  more  or  less  equally  important  in  an  actual  system,  we  have  decided 
to  focus  on  the  feature  extraction  and  classification  modules  only.  Other  topics  are  covered  by 
various  TNO-FEL  research  projects. 

The  location  problem  for  example,  can  be  solved  by  integrating  multi-sensor  information 
originating  from  visible  light  and  infrared  sensors.  An  infrared  image  can  be  used  for  object 
location  by  detection  of  hotspots,  characteristic  radiation  patterns  in  the  infrared  part  of  the 
spectrum.  Moving  objects  may  be  located  by  searching  for  irregularities  in  the  optical  flow  field 
calculated  from  a  sequence  of  images.  Moving  objects  induce  optical  flow  vectors  that  are 
different  from  the  vectors  due  to  the  motion  of  the  sensor  platform  [Beck,  *92). 

In  our  approach,  neural  network  algorithms  are  embedded  in  the  classification  module  only. 
Neural  networks  may  be  used  for  feature  extraction  also  [Perantonis  and  Lisboa,  ’92],  but  since 
we  want  to  have  complete  control  over  the  features  that  are  used  by  the  system,  the  feature 
extraction  concepts  within  this  most  important  module,  will  be  based  on  classical  techniques. 
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This  document  starts  with  the  introduction  of  the  geometric  and  Zemike  moments  in  Section  2. 
The  geometric  moments  will  be  applied  to  resize  the  object  in  the  image  being  processed  to  a 
uniform  area  and  to  place  the  resulting  object  representation  in  the  centre  of  the  image.  The 
Zemike  moments  will  be  evaluated  to  capture  rotation  invariant  object  features  in  a  consistent 
manner.  These  features  will  be  used  to  classify  the  object,  regardless  of  its  rotated  position  in  the 

scene. 

The  evaluation  of  Zemike  moments  is  a  computationally  complex  task.  In  Section  3,  we  describe 
how  the  geometric  and  Zemike  moment  extraction  process  can  be  implemented  efficiently  in 
software.  Furthermore,  an  alternative  solution  is  presented  for  real-time  implementations.  In 
Section  4  the  context  and  the  setup  of  the  experiments  to  evaluate  the  Zemike  moments  as 
invariant  features  for  object  recognition  are  presented.  An  overview  of  the  database  comprising 
test  images  of  military  vehicles  is  given  and  the  different  classification  approaches,  both  based  on 
neural  networks  and  classical  techniques,  are  given.  Finally,  in  Section  5.  the  results  of  the 
numerous  experiments  are  presented  and  conclusions  are  drawn  related  to  the  usefulness  of  the 
application  of  Zernike  moments  within  the  context  of  automatic  target  recognition. 

The  research,  of  which  the  results  are  presented  in  this  report,  has  been  done  within  the 
framework  of  the  feasibility  study  RPVM-NN  which  has  been  carried  out  for  the  HOE-AI  project 
Haalbaarheidstudie  Neurale  Netwerken  (Gaining  Experience  with  Artificial  Intelligence)  under 
responsibility  of  the  Development  Centre  for  the  Automation  of  Weapon  and  Command  systems 
of  the  Directorate  of  Economy  and  Finance  of  the  Royal  Netherlands  Army 
(DEBKL/DCAWACO). 
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2.  GEOMETRIC  AND  ZERNIKE  MOMENTS  AND  THEIR  APPLICATIONS 

In  this  section  some  fundamentals  related  to  moments  and  functions  of  moments  are  presented. 
These  fundamentals  will  be  used  in  the  following  sections  to  explain  the  design  of  the  translation, 
scale  and  rotation  invariant  target  recognition  system  evaluated  in  this  report. 

In  Section  2. 1  an  overview  of  recent  literature  related  to  various  types  of  moments  is  presented. 
In  Section  2.2  the  geometric  moments  are  introduced  where  in  Section  2.3  the  emphasis  lies  on 
the  complex  Zernike  Moments. 

2. 1  Moments  and  Functions  of  Moments 

Moments  and  functions  of  moments  are  powerful  object  characterising  features  in  classification 
problems,  involving  invariant  recognition  of  2-dimensional  patterns  in  an  image.  Moments  are 
actually  projections  of  the  image  parameter  function  onto  a  set  of  polynomials  where  the 
parameters  are  the  pixel  coordinates.  If  we  use  a  set  of  mutually  c  rthogonal  polynomials,  we 
obtain  a  set  of  unconelated,  non-redundant  characteristics.  Fundamentals  related  to  the  theory  of 
moments  can  be  found  in  [Gonzalez  and  Wintz,  *77]  and  [Rosenfeld  and  Kak.  *82J. 

The  concept  of  moment  invariants  for  pattern  recognition  was  introduced  by  Hu  in  1962  [Hu. 
*62].  Hu  has  derived  a  set  of  invariant  moments  that  has  the  property  of  being  invariant  under 
image  translation,  scaling  and  rotation.  An  example  of  the  application  of  moments  in  a  problem 
context  similar  as  ours  is  given  in  [Dudani  et  al„  *77].  Teague  [Teague,  *80]  suggested  the  use  of 
orthogonal  moments  based  on  the  theory  of  orthogonal  polynomials  and  introduced  the  Zernike 
moments  which  provide  independent  moment  invariants  to  an  arbitrarily  high  order.  Other 
moments  are  the  pseudo-Zemike  moments,  rotational  moments  [Boyce  and  Hossack,  ’83], 
complex  moments  [Abu-Mostafa  and  Psaltis,  '84][Abu-Mostafa  and  Psaltis,  *85]  and  the 
Legendre  moments  based  on  the  Legendre  polynomials  [Teh  and  Chin.  *88]. 

Satisfactory  experimental  outcomes  resulted  in  considerable  attention  in  literature  up  until  now. 
For  example,  a  revised  fundamental  theorem  concerning  moments  is  given  in  [Reiss,  *91]  and  an 
extended  survey  of  moment  based  techniques  for  recognition  is  presented  in  [Prokop  and  Reeves, 
*92], 


In  the  context  of  pattern  recognition,  moment  invariances  can  be  considered  as  reliable  and  robust 
features  if  their  values  are  insensitive  to  image  noise.  Detailed  results  concerning  noise  sensitivity 
and  an  approach  to  select  an  optimal  set  of  moment  features  are  given  in  [Teh  and  Chin,  *88]  and 
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[Khotanzad  and  Hong,  *90],  respectively.  In  theory,  most  of  the  image  information  can  be 
reconstructed  by  using  a  sufficiently  large  number  of  a  particular  set  of  image  moments.  This  fact 
can  be  used  to  select  the  appropriate  number  of  object  features  necessary  to  solve  a  classification 
problem. 

Having  captured  the  image  information  into  a  set  of  moments,  still  leaves  us  with  the  actual 
classification  problem.  In  [Khotanzad  and  Lu,  *90]  moments  are  used  in  combination  with  a 
multi-layer  perception  neural  network  as  a  promising  solution  to  the  translation,  scaling  and 
rotation  invariant  classification  of  objects  in  an  image.  Grosso  modo,  we  have  adapted  their 
approach  in  building  an  invariant  object  recognition  system.  In  this  section  the  geometric  and 
Zemike  moments  will  be  introduced  where  in  Section  4  the  multi-layer  percepron  neural  network 
will  be  presented. 

2.2  Geometric  Moments 

The  geometric  or  regular  moments  are  projections  of  the  image  function  onto  the  monomial  xPyQ 
where  x,y  are  the  image  coordinates.  The  regular  moments  of  order  (p+q)  of  the  image 
function  f(x,y)  are  defined  as 

Mpq=^j_jpyqf(x,y)dxdy,  (2.1) 

where  p,q  =  0,1,2 . »  and  is  the  (p+q)th  order  moment  of  the  continuous  image  function 

f[x,y).  Assuming  that  the  image  function  fix,y)  is  piecewise  continuous  and  has  a  compact 
support,  moments  of  all  order  exist  and  the  infinite  set  of  moments  uniquely  determines  fix,y). 
Reversely,  the  moments  are  themselves  uniquely  determined  by  f(x,y).  For  digital  images  the 
integrals  in  Equation  2. 1  are  replaced  by  summations  and  becomes 


=  (2.2) 
X  y 

As  mentioned  above,  the  definition  of  regular  moments  has  the  form  of  the  projection  of  the 
image  function  f[x,y)  onto  the  monomial  xPyi.  The  disadvantage  of  using  geometric  moments  is 
that  the  basis  set  {xP/i},  while  complete,  is  not  orthogonal.  Therefore,  information  captured  in 
these  moments  is  redundant. 
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Regular  moments  characterise  the  spatial  distribution  of  an  image.  Image  statistics  such  as  the 
centre  of  mass,  and  moments  of  inertia  in  an  image  can  be  directly  calculated  from  these 
moments.  The  image  centre  of  mass  or  centroid  location  (x,y )  can  be  obtained  from  the  first  and 

uexoth  moment  and  is  given  by 


5T=“0L.  (2.3) 

Moo  Moo 

The  zeror/j  and  first  order  regular  moments  are  important  for  our  object  classification  system 
since  they  will  later  be  used  in  a  preprocessing  stage  to  centre  the  object  in  the  image.  In  this  way 
translation  invariance  is  achieved. 


Figure  2. 1 :  Radial  polynomials  R^  for  various  values  of  the  order  n  with  m  set  to  0.  Note  that 

Rnm(l)^1  for  all  n  and  m. 

2.3  Zemike  Moments 

The  disadvantage  of  the  regular  moments  as  presented  in  Section  2. 1  is  that  the  basis  set  xPyQ  is 
not  orthogonal.  Therefore,  features  defined  on  functions  of  the  basis  set  are  not  optimal  with 
respect  to  information  redundancy  and  other  characteristics.  Uncorrelated  features  can  be  derived 


f 
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by  making  use  of  orthogonal  functions.  To  this  end,  the  Zemike  set  of  complex  polynomials 
{ V#un(p>®)}  is  introduced  [Zemike,  '34J.  This  set  constitutes  a  complete  orthogonal  set  defined 

over  the  interior  of  the  unit  circle,  i.e.  x2  +  y2  5 1 . 

Zemike  moments  are  the  projection  of  the  image  function  fix,y)  onto  this  set  of  orthogonal 
basis  functions  {^(p.G)} .  The  orthogonal  basis  functions  can  be  written  as 

Vnm(x,y)  =  V,WM(p.0)  =  ^(p)expOme).  (2.4) 

Here,  the  orthogonal  basis  function  is  written  as  the  product  of  a  radial  polynomial  p  )  and  a 
harmonic  function  of  the  angular  coordinate  (phase  component).  The  definition  of  the  radial 
polynomial  /f^/p  )  is  rather  complex  and  is  derived  in  [Bom  and  Wolf.  75]: 


„  (-1)*  [(n-r)!)p"_2i 

Rnm  (P)_  X  \  ✓ _ - 


,( n+\m\  Yf/i-l/nl  \ 

"°  s!(— -IJ’ 


The  normalisation  of  the  polynomial  has  been  chosen  so  that  for  all  permissible  values  of  n 
( degree )  and  m  (angular  dependence  or  repetition ), 

Rm(D  =  l  .  (2-6) 

Several  examples  of  radial  polynomials  of  various  combinations  of  n  and  m,  normalised  to 

Brand)-1'  316  8iven  in  Figure  2.1. 


In  Table  2. 1  the  explicit  form  of  the  polynomials  for  the  first  few  values  of  the  indices  n  and  m  is 
given.  The  polynomials  V^x.y)  are  defined  for  order  n  with  repetition  m  where  n  =  0,1,2, ... ,  00 
and  m  takes  on  positive  and  negative  integer  values  subject  to  the  conditions  n  -  Iml  =  even  and 
Iml^n.  The  Zemike  moments  of  order  n  with  repetition  m  for  a  continuous  image  function, 
f[x,y),  that  vanishes  outside  the  unit  circle,  is 


Aran  =  J 


(2.6) 


Here,  the  symbol  *  denotes  the  complex  conjugate.  For  a  digital  image,  the  integrals  in  Equation 
2.6  are  replaced  by  summations  resulting  in 


— XX/(*.y)C(P.0)  •  x2+y2<L  1  (2.7) 

*  x  y 

To  compute  the  Zemike  moments  of  a  given  image,  the  centre  of  the  image  is  taken  as  the  origin 
and  pixel  coordinates  are  mapped  to  the  range  of  the  unit  circle.  Those  pixels  falling  outside  the 
unit  circle  are  omitted  during  the  computation. 


1  m\n 

H 

l 

2 

3 

4 

5 

6 

fl 

1 

2p  2  -1 

6p 4  -tip 2  +1 

20p6  -30p4  +12p2  -1 

t 

P 

3p 3  -2p 

10p5  —  12p  3  +3p 

2 

RBI 

era 

3 

P3 

5p5  -4p3 

4 

p4 

6p  6  -5p4 

5 

P5 

6 

P6 

Table  2. 1 :  The  radial  polynomials  R^fp)  for  m  i  6,  n  <,  6. 

The  jnitudes  of  the  Zemike  moments  are  invariant  under  image  rotation.  Given  a  rotation  of 
an  image  through  an  angle  $ ,  the  relationship  between  A  „„  and  A^,  the  Zemike  moment  of  the 

rotated  image  aod  V.  unrotated  one,  is  given  by 

Km  =  (2.8) 

This  relation  shows  that  Zemike  moments  have  simple  rotational  transformation  properties;  each 
Zemike  moment  merely  acquires  a  phase  shift  on  rotation.  From  this  relation  the  desired  property 
follows:  The  magnitude  14^1  of  the  Zemike  moments  of  a  rotated  image  function  remain 
identical  to  the**,  c-ef- v?  rotation.  Hence,  the  magnitude  14^1  of  the  Zemike  moment  can  be  taken 
as  a  rotation  invariant  feature  of  the  underlying  image  function. 


TNO  report 


Page 

19 


s 
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Order 

Moments 

No.  of 

Moments 

0 

*00 

1 

1 

*n 

1 

2 

*20’  *22 

2 

3 

*31’  *33 

2 

4 

*40’  *42’  *44 

3 

5 

*31’  *53’  *35 

3 

6 

Aflfl.  A62.  Am,  Am 

4 

7 

*ij,  A73,  A7j.  A„ 

4 

8 

A*0’  *12’  *34’  *w  *»» 

5 

9 

*91’  *93’  *95’  *97’  *99 

5 

10 

*10.0’  *102’  *10.4’  *10.6’  *)0.S’ 

6 

*10.10 

11 

*11.1’  *11.3’  *1 » J5’  *11.7’  *11.9’ 

*11.11 

6 

12 

*12.0’  *122’  *12,4’  *I2.6’  *12.S’ 

*12.10  ’  *12.12 

7 

Table  2.2:  List  of  Zemike  moments  and  their  corresponding  number  of  features  from  order  0 

up  to  order  12. 

The  image  lunction./Lr,y)  can  be  expanded  in  terms  of  the  Zemike  polynomials  over  the  unit  disk 
as 


/(*•?)=  X  ’  n-\m\=even,  (2.9) 

n=0m=-w 

where  the  Zemike  moments  are  computed  as  in  Equation  2.7. 

If  the  series  expansion  is  truncated  at  a  finite  order  N,  then  the  truncated  expansion  is  the 
optimum  approximation  f(x,y )  tofix ,y): 


(2.10) 


N 

f(x-y)  =  I5>~  Vnm(x<y)  •  n-im\=even,  ImlS/i  . 

»» 0  m 

Though  image  reconstruction  from  Zernike  moments  is  not  the  goal  of  our  algorithm.  Equation 
2. 10  can  be  used  to  determine  the  optimal  order  of  moments  that  capture  the  most  important 
characteristic  features  of  the  object  to  be  classified. 

A  simple  error  function  defined  as  the  difference  between  the  original  image  function  f(x,y)  and 
the  reconstructed  image  function  f(x.y)  can  be  used  to  determine  the  optimal  order  N  of  the 
moments.  From  [Teh  and  Chin.  '88]  it  is  known  that  under  poor  signal  to  noise  conditions,  the 
reconstructed  image  degenerates  when  the  maximal  order  N  exceeds  a  certain  boundary.  In 
practice,  it  is  not  possible  to  determine  an  optimal  subset  of  complex  Zernike  moments  for 
classification  purposes  since  each  image  has  its  unique  optimal  decomposition  into  orthogonal 
components. 

In  case  of  selecting  the  order  of  the  Zernike  moments  to  be  equal  to  12.  the  total  number  of 
moments  is  equal  to  49  (See  Table  2.2).  In  this  case,  an  object  is  characterised  by  49  features. 
Ao,0-  A1212.  It  are  those  numbers,  grouped  into  a  feature  vector,  that  will  be  used  by  a  target 
recognition  system  as  input  for  the  classification  process. 


3. 


IMPLEMENTATION 


To  be  able  to  perform  the  experiments  to  investigate  the  performance  of  Zernike  moments  as 
invariant  features  for  object  recognition  we  have  implemented  the  equations  given  in  Section  2  in 
a  straightforward  way.  All  code  is  written  in  the  programming  language  C.  For  some  basic  image 
I/O  routines  and  image  preprocessing  functions  we  have  made  use  of  the  image  processing  library 
Sunvision  IP  [Sun Vision,  *91].  Clusters  of  functions  are  combined  in  several  programs  resulting 
in  a  set  of  command  line  user  commands.  These  commands  can  be  used  to  perform  relevant 
operations  on  separate  images  in  command  line  mode.  The  individual  functions  itself  may  be 
used  in  newly  to  develop  programs  in  the  same  way  as  the  functions  available  in  the  SunVision 
library.  First,  in  Section  3.1  some  basic  considerations  concerning  implementation  aspects, 
especially  speed,  are  given.  Next,  in  Section  3.2  the  command  line  functions  are  described. 
Finally,  in  Section  3.3  the  underlying  basic  functions  are  presented.  For  a  more  detailed 
description  of  the  functions  we  refer  to  Appendix  B. 

3. 1  Zernike  Moments  and  Complexity 

An  image  can  be  considered  as  an  discrete  array  of  values  of  an  uniform  type.  A  complex  Zernike 
moment  of  an  given  order  and  repetition  of  this  image  can  be  obtained  by  evaluating  Equation 
2.7  in  Section  2.  From  this  equation  it  follows  that  we  have  to  evaluate  for  each  pixel  x.y  an 
orthogonal  basis  function  V^fp.fl)  where  n  is  the  order  and  m  the  repetition  of  the  Zernike 
moment  of  consideration.  Hence,  each  position  index  x.y  has  to  be  transformed  to  a  range  value  p 
and  an  angle  6. 

Since,  in  general,  we  are  interested  in  a  extended  set  of  Zernike  moments,  several  orthogonal 
basis  functions  are  evaluated  for  each  individual  array  index  x.y.  Therefore,  for  each  array  index, 
the  transformation  from  Cartesian  x.y  coordinates  to  polar  p.6  coordinates  is  done  in  advance: 
Two  arrays  are  generated,  a  range  image  and  a  phase  image,  of  the  same  dimension  as  the  image 
to  be  processed.  In  the  range  image,  at  index  x.y,  the  range  from  the  image  centre  to  pixel  x.y  is 
stored  and  in  the  phase  image,  at  index  x.y,  the  phase  rotation  from  an  imaginary  coordinate 
system  positioned  at  the  image  centre  is  stored. 

Next,  for  each  Zernike  order  n  and  repetition  m,  and  each  polar  coordinate  p  and  0,  the 
orthogonal  basis  function  V^fp,©)  has  to  be  evaluated.  As  can  be  seen  from  Equation  2.4,  Vnm( 
p.0)  is  a  product  of  a  radial  polynomial  and  a  phase  component.  The  complexity  of  the  radial 
component  depends  on  the  order  of  the  Zernike  moment  of  consideration.  The  higher  the  order. 


the  larger  the  number  of  terms  in  the  polynomial  that  have  to  be  evaluated  (see  Table  2.1). 
Therefore,  the  processing  time,  required  to  determine  the  complex  Zemike  moment  of  an 
particular  order  n,  increases  for  higher  order  moments.  A  first  approach  to  increase  processing 
speed  is  based  on  the  calculation  of  Zemike  templates.  As  can  be  seen  from  Equation  2.7,  the 
determination  of  a  Zemike  moment  A^  can  be  considered  as  a  dot  product  between  image  array 
f  and  basis  function  array  V^.  This  basis  function  array  will  be  denoted  as  a  Zemike  moment 
template  of  order  n  and  repetition  m.  The  templates  can  be  computed  for  each  order  and  repetition 
in  advance.  A  template  is  computed  by  evaluating  for  each  array  index  x.y  the  orthogonal  basis 
function  V^Cp.O)  of  Equation  2.4  making  use  of  the  predetermined  range  and  phase  arrays  to 
transform  Cartesian  coordinates  into  polar  coordinates  (table  lookup).  In  Image  3.2  and  3.3 
templates  for  various  Zemike  moments  are  displayed.  The  influences  of  the  harmonics  controlled 
by  the  repetition  coefficients  can  clearly  be  seen. 

Note:  Since  we  only  use  absolute  values  of  the  complex  Zemike  moments,  processing  time  can 
also  be  reduced  by  determining  only  the  absolute  part.  However,  in  our  implementation,  the 
complex  template  array  is  subdivided  into  an  array  containing  the  real  part,  an  array  containing 
the  imaginary  pan  and  a  third  array  comprising  the  absolute  value  of  the  complex  template. 

The  advantage  of  the  template  based  approach  lies  in  the  fact  that  the  computation  time  of  each 
individual  Zemike  moment  is  independent  of  the  order  and  repetition:  The  number  of  operations 
required  to  process  the  images  is  fixed.  However,  the  time  needed  to  calculate  the  templates  in 
advance  does  depend  on  the  specific  order. 

The  most  cost  efficient  solution  one  can  think  of  to  obtain  Zemike  moments  of  an  given  image,  is 
based  on  reducing  the  multiplication  of  two  images  to  the  multiplication  of  two  vectors.  Since  for 
recognition  purposes  we  are  not  interested  in  the  phase  information  of  the  Zemike  moments,  we 
have  already  omitted  this  information  in  the  approach  described  above  (we  do  only  need  phase 
information  for  reconstruction  purposes).  By  doing  this,  we  actually  multiply  each  image  pixel 
positioned  on  the  same  concentric  circle  of  radius  p,  by  a  radial  polynomial  of  order  n  and 
repetition  m  at  radial  distance  p.  When  we  intergrate  the  pixels  lying  on  one  and  the  same 
concentric  circle,  before  multiplying  with  the  radial  polynomial,  we  do  reduce  the  number  of 
multiplications  dramatically.  One  way  of  summing  pixels  lying  on  a  concentric  circle  is  first 
transforming  the  image  coordinates  from  Cartesian  to  polar  coordinates.  In  this  representation, 
pixels  lying  in  the  same  row  all  have  the  same  relative  angle  with  respect  to  a  imaginary 
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Cartesian  coordinate  system  positioned  at  the  image  centre,  where  pixels  lying  in  the  same 
column  all  are  lying  at  the  same  radial  distance  from  this  coordinate  system  centre. 

After  transforming  the  image  coordinates  from  Cartesian  to  polar,  the  integration  of  the  pixels 
lying  on  the  same  concentric  circle  can  be  accomplished  by  simply  summing  the  pixels  lying  in 
one  and  the  same  column.  An  example  of  transforming  an  image  from  Cartesian  coordinates  to 
polar  coordinates  is  given  in  Image  3.1. 

There  is  a  lot  of  computational  overhead  in  transforming  the  image's  coordinate  system  from 
Cartesian  to  polar.  Especially  when  we  make  use  of  subsampling  interpolation  techniques. 
However,  the  reduction  in  computational  complexity  switching  from  matrix-matrix  multiplication 
to  vector-vector  multiplication  is  of  such  order  that  the  overall  complexity  reduction  may  result  in 
an  much  faster  overall  implementation  of  the  Zernike  moments  generation  function. 

Note:  For  accuracy  reasons,  it  may  be  necessary  to  increase  the  image  dimensions  transforming 
{  from  Cartesian  to  polar  coordinates  resulting  in  a  vector  product  of  a  larger  dimension  than  the 

i 

■j  image  dimension. 

3.2  Functional  Description 

In  this  subsection  we  present  a  set  of  command  line  functions  related  to  the  extraction  of  Zernike 
complex  moments  from  an  image  array.  For  some  of  the  functions  described,  a  standard 
implementation  and  a  fast  implementation  based  on  templates  are  given.  During  the  experiments  we 
have  made  use  of  the  fast  implementation.  The  numerical  results  of  both  implementations  do  differ 
slightly  due  to  round-off  errors  in  the  Zernike  template  images.  However,  this  has  no  effect  on  the 
classification  process. 

3.2. 1  zemike_moment 

The  Zernike  moments  of  an  image  can  be  calculated  by  the  user  command  ziemike_moment.  The 
command  generates  the  Zernike  moments  of  the  input  image  and  stores  the  results  in  an  output 
file.  For  each  moment,  the  real,  imaginary  and  absolute  value  of  the  complex  Zernike  moment  are 
successively  stored.  The  moments  are  calculated  from  a  specified  minimal  order  value  up  to  a 
specified  maximal  order  value  for  all  valid  repetition  values.  The  range  of  the  valid  moment  order 
parameters  lies  between  0  and  20.  For  a  complete  synopsis  of  the  command  zemike.moment,  see 
Appendix  B. 
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The  command  zemike_moment  is  based  on  the  functions  av_radial_distjmage, 
av_angular_image  and  the  function  avjzemikejnoment.  These  functions  will  be  described  in  the 
following  section. 

3.2.2  zemike_fmoment 

The  implementation  of  the  command  zemike_moment  is  relatively  slow.  It  takes  about  i  second 
on  average  for  each  calculated  moment  on  a  Sun  Sparc  station  2.  Determining  all  moments  up  to 
order  20  (121  coefficients),  this  processing  takes  about  2  minutes!  A  faster  implementation  of  the 
command  zemike_moment  is  available  via  the  command  zemike Jmoment.  The  command 
generates  the  Zemike  moments  of  the  input  image  and  stores  the  results  in  an  output  file.  For 
each  moment,  the  real,  imaginary  and  absolute  value  of  the  complex  Zemike  moment  are 
successively  stored.  The  calculation  is  based  on  the  predefined  Zemike  moment  templates  for 
much  faster  calculation.  The  moments  are  calculated  from  a  specified  minimal  order  value  up  to  a 
specified  maximal  order  value  for  all  valid  repetition  values.  The  range  of  the  valid  moment  order 
parameters  lies  between  0  and  20.  For  a  complete  synopsis  of  the  command  zernike_fmoment. 
see  Appendix  B. 

The  command  zernike_fmoment  is  based  on  the  function  av_nnv_zemike_moment.  This  function 
will  be  described  in  the  following  section. 

3.2.3  zemike_reconstruct 

An  image  can  be  (partly)  reconstructed  from  its  moments.  The  command  zemike reconstruct 
generates  an  image  based  on  the  complex  Zemike  moments.  The  quality  of  the  reconstruction 
depends  on  the  number  of  moments  available.  The  image  is  reconstructed  making  use  of  a  set  of 
moments  ranging  from  a  specified  minimal  order  up  to  a  specified  maximal  order.  The  range  of 
valid  moment  order  parameters  lies  between  0  and  20.  The  resulting  output  image  is  the  sum  of 
an  input  image,  possibly  empty,  and  the  generated  image  based  on  the  moment  coefficients.  For  a 
complete  synopsis  of  the  command  zernike_reconstruct,  see  Appendix  B. 

The  command  zemike_reconstruct  is  based  on  the  functions  avjradialjiistJmage, 
av_angular_image  and  the  function  av jreconstruct.  These  functions  will  be  described  in  the 
following  section. 
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3.2.4  zemikejreconstruct 

As  is  the  case  for  the  command  zemike_moment,  a  faster  version  of  the  command 
zernike_reconstruct  is  available  via  the  command  zemike Reconstruct.  The  command 
zemikejreconstruct  generates  an  image  based  on  the  complex  Zemike  moments.  The  quality  of 
the  reconstruction  depends  on  the  number  of  moments  available.  The  image  is  reconstructed 
making  use  of  a  set  of  moments  ranging  from  a  specified  minimal  order  up  to  a  specified 
maximal  order.  The  range  of  valid  moment  order  parameters  lies  between  0  and  20.  The  resulting 
output  image  is  the  sum  of  an  input  image,  possibly  empty,  and  the  generated  image  based  on  the 
moment  coefficients.  The  calculation  is  based  on  predefined  Zemike  moment  templates  for  fast 
calculation.  For  a  complete  synopsis  of  the  command  zemike_freconstruct,  see  Appendix  B. 

The  command  zerni ke_freconstruct  is  based  on  the  function  av_new_reconstruct.  This  function 
will  be  described  in  the  following  section. 

3.2.5  zemikejemplate 

As  mentioned  above,  the  commands  zemike_fmoment  and  zerni  ke_freconstmct  make  use  of 
predefined  templates.  These  templates  can  be  generated  by  the  command  zemike jemplate.  The 
command  zemikejemplate  generates  template  images  in  the  .VFF  file  formal  [SunVision,  ’91)  of 
Zemike  polynomials  making  use  of  a  set  of  moments  ranging  from  a  specified  minimal  order  up 
to  a  specified  maximal  order.  The  range  of  valid  moment  order  parameters  lies  between  0  and  20. 
For  each  valid  order-repetition  combination  a  separate  template  is  generated  and  stored  in  a  file. 
The  template  consists  of  three  bands.  One  band  to  calculate  the  real  part  of  the  Zemike  moment, 
one  band  to  calculate  the  imaginary  part  of  the  Zemike  moment  and  one  band  to  calculate  the 
absolute  value  of  the  Zemike  moment.  This  latter  band  may  be  omitted  since  the  absolute  value 
can  be  obtained  from  the  real  and  imaginary  value.  For  a  complete  synopsis  of  the  command 
zemike_template,  see  Appendix  B. 

The  command  zemikejemplate  is  based  on  the  functions  av_radial_distjmage, 
av_angular_image  and  the  function  av_make _polynomialJmage.  These  functions  will  be 
described  in  the  following  section. 
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3.2.6  normjmage 

Before  the  Zemike  moments  of  an  image  can  be  determined,  the  image  object  has  to  be  centred 
and  scaled  to  a  uniform  size.  This  preprocessing  may  be  performed  by  the  command 
normjmage.  The  command  normjmage  places  the  image  object  into  the  centre  of  the  image  by 
determining  the  centre  of  gravity  of  the  object.  Then  the  image  centre  is  translated  to  this 
position.  Furthermore,  the  command  normalises  the  binary  (logical)  image  with  respect  to  the 
number  of  non-zero  pixels  in  the  image.  The  desired  number  of  non-zero  pixels  in  the  output 
image  can  be  specified  by  the  user. 

The  input  image  must  be  either  an  avBYTE  image  with  logical  values  0  and  255,  or  an  avFLOAT 
image  with  logical  values  0.0  and  1.0.  For  a  complete  synopsis  of  the  command 
zemikejemplate,  see  Appendix  B. 

The  command  normjmage  is  based  on  the  functions  av_smoments  and  av_nomiJmage.  These 
j  functions  will  be  described  in  the  following  section. 

A 

3.3  Active  Vision  Library  Functions 

All  command  line  user  commands  are  based  on  Sunvision  library  functions  and  a  selection  of  the 
Active  Vision  Library  functions  described  below.  The  Active  Vision  library  is  a  set  of  image 
processing  functions  developed  by  TNO-FEL.  A  synopsis  of  each  function  is  given  in  Appendix 
B. 

3.3.1  av_radial_distjmage 

The  function  av_radial_distjmage  generates  a  radial  distance  image  having  the  same  dimensions 
as  the  input  image.  The  pixel  values  in  the  radial  distance  image  represent  the  polai  coordinate  p 
measured  relative  to  the  coordinate  system's  centre.  Therefore,  the  rectangular  and  polar 
coordinate  system  centres  are  translated  to  the  image  centre.  The  generated  radial  distance  image 
may  be  used  as  a  look-up  table  to  transform  a  rectangular  or  Cartesian  coordinate  system  into  a 
polar  coordinate  system.  The  radial  distance  p  of  the  transformed  grid  position  (x.y)  ->  (p,0).  is 
given  by  image(x,y).  The  radial  distance  coordinate  p  is  ranging  from  0.0  to  1.0.  Pixels  lying 
outside  the  implicitly  defined  unit  circle  are  set  to  - 1 .0. 
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3.3.2  av_angular_image 

The  function  av_angularjmage  generates  an  angular  image  having  the  same  dimensions  as  the 
input  image.  The  pixel  values  in  the  angular  image  represent  the  polar  coordinate  6  measured 
relative  to  the  x-axis  in  a  counter  clock  wise  direction.  Therefore,  the  rectangular  and  polar 
coordinate  system  centres  are  places  into  the  image  centre.  The  generated  angular  image  may  be 
used  as  a  look-up  table  to  transform  a  rectangular  or  Cartesian  coordinate  system  into  a  polar 
coordinate  system.  The  angle  theta  of  the  transformed  grid  position  (x.y)  ->  (p,0),  is  given  by 
image(x,y).  The  angular  coordinate  theta  is  ranging  from  0.0  to  2*pi. 

3.3.3  av_make_polynomial_image 

The  function  avjnake ^polynomial Jniage  generates  a  Zemike  template  image  for  a  complex 
Zemike  polynomial  of  a  predefined  order  and  repetition.  The  template  combines  the  radial 
polynomial  information  with  the  phase  information.  The  template  exists  of  a  real,  imaginary  and 
absolute  part,  all  stored  in  different  bands.  The  real  template  is  stored  in  hard  0.  the  imaginary 
template  is  stored  in  band  1,  and  the  absolute  template  is  stored  in  band  2.  The  template  may  be 
used  to  determine  the  complex  Zemike  moment  for  the  given  order  and  repetition  of  an  image 
having  the  same  dimensions  as  the  template.  The  real,  imaginary  and  absolute  moment  value  can 
be  obtained  by  multiplying  the  image  with  the  appropriate  template  band  and  sum  the  result. 

3.3.4  av_zemike_moment 

The  function  av_zemike_moment  determines  the  complex  Zemike  moment  of  a  given  order  and 
repetition  of  the  input  image.  The  function  recalculates  for  each  pixel  value  the  radial  polynomial 
value  and  the  phase  information  according  to  Equation  2.6  in  Section  2.  This  is  done  by  a  call  to 
the  function  av_radial_polynomial  and  a  sine  and  cosine  transform.  The  moment  calculation  is 
done  in  the  polar  coordinate  system.  The  transformation  of  the  rectangular  coordinate  system  into 
the  polar  coordinate  system  is  based  on  a  table  look-up  strategy.  Therefore,  a  radial  distance 
image  and  a  angular  image  of  the  appropriate  dimensions  serve  as  input  arguments  to  the 
function. 

3.3.5  av_new_zemike_moment 

The  function  av_new_zeniike_nioment  determines,  like  the  previous  function,  the  complex 
Zemike  moment  of  a  given  order  and  repetition  of  the  input  image.  Therefore,  the  function  make 


use  of  the  Zemike  template  of  the  correct  order,  repetition  and  dimensions.  The  function 
multiplies  the  input  image  with  the  appropriate  bands  (real  and  complex  to  determine  the  real  and 
complex  part  of  the  moment  variable)  of  the  template  and  adds  the  pixels  of  the  resulting  image 
together.  This  approach  is  much  faster  than  the  approach  described  for  the  function 
av_zemike_moment.  Furthermore,  the  calculation  time  of  moments  of  any  order  are  identical 
where  the  calculation  time  of  higher  orders  grow  exponentially  with  the  previous  approach. 

3.3.6  av_reconstruct 

The  function  avjreconstruct  generates  an  image  based  on  one  single  complex  Zemike  moment 
coefficient  of  a  predefined  order  and  repetition.  For  each  rectangular  grid  position  the  related 
polar  coordinates  p  and  8  are  determined  by  looking  up  these  values  in  the  look-up  tables  angular 
image  and  radial  distance  image.  Therefore,  a  radial  distance  image  and  a  angular  image  of  the 
appropriate  dimensions  serve  as  input  arguments  to  the  function.  Given  the  radial  distance  value 
p,  the  order  and  the  repetition,  the  radial  polynomial  value  is  obtained  by  a  function  call  to  the 
function  av_radial_polynomial.  The  reconstruction  is  based  on  Equation  2.10  in  Section  2.  From 
this  equation  it  follows  that  the  reconstructed  pixel  value  is  a  weighted  sum  of  sines  and  cosines 
of  the  radial  polynomial  value. 

3.3.7  av_new_reconstruct 

The  function  av_new_reconsiruct  generates  an  image  based  on  one  single  complex  Zemike 
moment  coefficient  of  a  predefined  order  and  repetition.  The  reconstruction  is  based  on  Equation 
2.10  in  Section  2.  To  improve  the  speed  of  the  reconstruction,  the  approach  is  based  on  a 
predefined  template  of  the  appropriate  order,  repetition  and  dimensions.  The  computations  are 
reduced  to  multiplying  the  real  template  image  with  the  real  Zemike  moment  coefficient, 
multiplying  the  imaginary  template  image  with  the  imaginary  Zemike  moment  coefficient  and 
adding  the  two  resulting  images  together. 

3.3.8  av_radial_polynomial 

The  function  avjradial_polytiomial  determines  the  Zemike  radial  polynomial  of  a  given  order 
and  repetition  for  a  specific  radial  distance.  The  function  is  based  on  Equation  2.5  in  Section  2. 
When  the  order  increases,  the  evaluation  of  the  function  takes  longer  since  more  terms  of  the 
power  series  have  to  be  determined.  The  increase  in  time  is  exponentially. 


4. 


EXPERIMENTS 


In  the  previous  section  we  have  described,  at  a  functional  level,  all  the  programs  that  had  to  be 
developed  before  we  could  start  any  experiments  related  to  the  evaluation  of  Zernike  moments 
and  their  application  as  invariant  features  for  object  recognition.  In  this  section  we  focus  on  all 
the  other  issues  that  are  important  to  be  able  to  generate  experimental  data  and  to  evaluate  the 
results. 

First  of  all,  we  need  test  data,  i.e.  a  series  of  test  images  recorded  under  well  defined  conditions. 
Since  we  had  decided  from  the  beginning,  only  to  focus  on  the  feature  extraction  and  object 
identification  module  of  the  target  recognition  processing  scheme,  we  assume  that  the  problems 
related  to  the  preprocessing-,  object  location-  and  object  segmentation-stages,  as  described  in 
Section  1,  are  solved.  Especially  concerning  the  object  segmentation  stage,  this  is  an  ideal  and 
fairly  hypothetical  situation. 

Second,  the  actual  classification  is  done  by  a  multi-layer  perception  neural  network.  In  Section 
4.2  some  fundamentals  related  to  this  network  are  repeated.  To  evaluate  the  performance  of  the 
neural  network  as  a  classifier,  we  have  to  compare  the  network  with  traditional  statistical 
classifiers  like  the  nearest  neighbour  classifier  and  the  multidimensional  probability  density 
function  estimator  presented  by  Parzen  fParzen,  ’62).  The  two  classical  classifiers  we  have  used 
for  comparison  are  described  in  Section  4.3. 

4. 1  Database  Description 

To  perform  experiments  under  controlled  conditions  a  database  of  binary  images  is  generated. 
The  database  consists  of  images  of  256x256  pixels  stored  in  Sunvision's  .VFF  file  format 
[SunVision,  *91].  Each  image  represents  an  object  separated  from  the  background.  An  image 
pixel  is  represented  by  a  float  value.  Since  the  images  are  bi-valued,  there  are  only  two  possible 
pixel  values.  The  object  pixels  are  indicated  by  the  value  1.0,  where  the  background  pixels  are 
indicated  by  the  value  0.0.  Each  image  file  comprises  exactly  one  single  banded  (mono-colour) 
image.  An  example  of  an  image  fiom  the  database  and  the  significances  of  this  image  related  to 
an  image  from  the  actual  object  can  be  seen  in  Image  4.1. 

The  objects  represented  in  the  images  are  army  vehicles  and  tanks.  The  database  comprises 
images  of  a  Leopard  2  tank,  a  M109  Howitzer,  a  T-80  main  battle  tank,  a  Gepard  and  a  Ml  13 
armoured  personnel  carrier.  A  list  of  the  database  is  given  in  Table  4. 1 . 
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The  images  originate  from  views  taken  by  a  ccd  camera  of  models  of  the  objects  at  a  scale  of 
1:37.  The  database  consists  of  Gve  different  object  types.  For  each  type,  images  are  taken  from 
three  different  viewing  positions,  resulting  in  three  different  viewing  directions.  A  viewing 
direction  can  be  indicated  by  two  angles  representing  the  relation  between  the  optical  axis  of  the 
camera  and  the  plane  the  object  is  positioned  on.  The  optical  camera  axis  is  directed  towards  the 
origin  of  a  coordinate  system  whose  x  and  y  axis  are  lying  on  the  ground  plane  and  the  z  axis  is 
perpendicular  to  the  ground  plane.  The  angle  of  the  optical  axis  with  the  object  ground  plane  is 
given  by  a.  where  the  angle  of  the  projected  optical  axis  onto  the  groundplane  with  the 
orthogonal  coordinate  system  is  given  by  0.  Variations  in  a  from  90°  towards  0°  effect  the 
viewing  direction  from  top  view  towards  horizontal  view.  Variations  in  0  from  90°  towards  -90° 
effect  the  viewing  direction  from  front  view,  via  side  view  towards  back  view.  In  Figure  4. 1  the 
relations  between  the  viewing  direction,  ground  plane  and  the  angles  a  and  0  are  depicted. 

As  stated  above,  the  database  comprises  three  viewing  directions.  The  first  viewing  direction  is 
represented  by  {a=90°,  0=0},  which  is  equal  to  a  top  view.  For  a  top  view,  the  angle  0  is 
irrelevant.  The  object  may  have  any  orientation  in  the  ground  plane,  and  hence  in  the  image 
plane.  The  second  viewing  direction  is  represented  by  {o=60°.  0=±3O°},  which  is  equal  to  a  top- 
semi-horizontal  view.  The  value  ±30°  indicates  that  the  angle  0  is  not  known  exactly  and  may 
vary  between  -30°  and  +30°.  A  value  for  0  of  +30°  represents  a  semi  front-side  view,  where  an 
value  for  0  of  -30°  represents  a  semi  back-side  view.  The  third  viewing  direction  is  represented 
by  {a=30°,  0=±3O°},  which  is  equal  to  a  semi-horizontal  view.  Again  the  angle  0  may  vary 
between  -30°  and  +30°.  Decreasing  the  viewing  direction  a,  the  viewing  direction  0  becomes 
more  and  more  relevant  since  the  front  view  of  a  vehicle  in  general  will  differ  largely  from  the 
side  view  or  back  view.  Therefore,  the  range  of  0  is  restricted  between  -30°  and  +30°. 

For  each  viewing  direction  six  different  images  of  the  same  object  are  taken.  Rotations  and  small 
translations  (relative  to  the  object  size)  of  the  object  and  different  focal  distances  of  the  lenses 
are  taken  into  account.  These  variations  are  useful  to  test  the  scale,  translation  and  rotation 
invariant  properties  of  the  various  algorithms.  As  an  example,  six  images  of  one  of  the  objects  in 
the  database,  taken  from  a  top-view  camera  position,  are  depicted  in  Image  4.2.  The  relative 
translation,  scaling  and  rotation  of  the  object  in  the  image  plane  can  clearly  be  seen.  In  a  real 
scenario,  the  different  projections  of  the  objects  on  the  image  plane  may  be  due  to  variations  in 
the  altitude  of  the  platform  the  sensors  are  mounted  on,  different  focal  lengths  of  the  cameras. 
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Figure  4. 1  Relations  between  the  sensor  viewing  direction,  the  ground  plane  onto  which  an 
object  is  positioned  and  the  angles  a  and  0. 

different  approach  directions  of  the  platform  relative  to  the  object  or  different  orientations  of  the 
object  relative  to  the  ground. 

In  some  cases  the  projection  of  an  object  onto  the  image  plane  not  only  differs  due  to  variations 
in  ike  viewing  direction,  but  also  to  varying  geometric  properties  of  the  object  itself.  An  example 
of  ti'Js  phenomenon  is  the  rotating  turret  of  a  tank.  The  projection  of  a  tank  onto  the  image  plane 
may  be  quiet  different  directing  the  barrel  forward  or  directing  the  barrel  sideward.  To  model  this 
effect,  images  are  taken  of  tanks  with  different  positions  of  the  turret.  The  turret  position  is 
indicated  by  the  angle  <p.  An  angle  of  cp=0°  represents  a  tank  with  the  barrel  directed  forward, 
where  an  angle  of  <p=+90°  represents  an  tank  with  the  barrel  directed  to  the  right  side,  etc.  Actual 
images  are  taken  with  barrel  angles  tp=0°,  <p=15°,  <p=30°  and  9=45°.  The  dramatic  effect  of  the 
turret  rotation  on  the  projected  view  of  the  object  can  be  seen  in  Image  4.3.  For  a  straightforward 


vision  system  it  is  hardly  possible  to  determine  that  these  images  do  originate  from  one  and  die 
same  object.  A  suggestion  how  to  solve  this  problem  is  given  in  Section  S. 


Table  4.1  Image  database  overview.  The  figures  within  brackets  represent  the  number  of 
images  per  given  tuner  angle  and  viewing  direction. 


4.2  Neural  Network  based  Classifiers 

In  recent  years  neural  networks  have  evolved  into  very  powerful  signal  processing  algorithms  for 
all  kinds  of  applications.  In  this  section  we  will  focus  on  the  relation  between  pattern  recognition 
and  classification  with  respect  to  neural  networks.  A  short  overview  of  the  basics  of  neural 
network  computing  is  given  where  the  emphasis  lies  on  the  error  back-propagation  network 
paradigm,  as  used  in  the  multi-layer  perception  MLP  network. 

4.2. 1  Neural  Networks  and  Pattern  Recognition 

The  application  area  of  neural  networks  within  the  context  of  object  classification  is  related  to  the 
field  of  pattern  recognition.  Pattern  recognition  can  be  considered  as  a  process  performing  a 
classification  operation  on  an  given  input.  This  implies  that  we  have  a  predefined  taxonomy  of 
classes  which  is  a  formal  or  informal  definition  in  what  way  members  of  a  class  are  distinct  from 
members  of  other  classes.  Note:  there  are  networks  that  do  define  their  own  taxonomy,  e.g.  the 
Kohonen  self  organising  feature  maps  [Kohonen,  *88]. 

In  general,  the  input  for  a  pattern  recognition  system  exists  of  a  set  of  measurements  comprising 
signal  characteristics  combined  into  a  pattern  or  feature  vector.  A  classification  algorithm  has  to 
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determine  for  each  input  and  for  every  class  of  patterns  the  probability  that  the  input  is  a  member 
of  a  particular  class. 

The  classification  is  based  on  weighting  features  that  are  discriminating  characteristics  for  the 
problem  of  interest  When  the  characteristic  features  are  distinct  for  each  class  the  classification  is 
simple.  More  often  however,  characteristic  features  are  not  discriminating  for  all  different  classes. 
Moreover,  in  real  world  applications  the  pattern  signatures  are  often  buried  in  noise  which  make 
the  specific  characteristics  even  more  diffuse.  Therefore  more  than  one  or  even  a  whole  set  of 
characteristics  may  be  involved  to  define  non  ambiguous  decision  criteria. 

Though  its  importance,  the  selection  of  a  set  of  characteristics  is  often  only  the  beginning  to  the 
solution  of  the  classification  problem.  The  next  step  lies  in  the  generation  of  a  practical 
description  of  the  selected  characteristics.  This  is  the  real  problem  since  for  many  problems  a 
concise  formal  and  explicit  description  of  the  pattern  characteristics  is  very  hard  or  even 
impossible  to  give.  The  ability  of  neural  networks  to  learn  by  example  therefore  is  the  key  to  the 
solution  to  circumvent  the  necessity  of  working  with  explicitly  described  characteristics.  A  neural 
network  obtains  knowledge  about  feature  values  common  to  members  of  a  particular  class  and 
their  corresponding  tolerance  regions  by  processing  examples  of  different  classes  presented  at  the 
input  of  the  network.  This  process  is  called  training  or  learning  in  analogy  of  the  behaviour  of 
biological  species. 

The  essence  of  the  application  of  a  neural  network  as  a  classifier  lies  in  the  network's  ability  to 
recognise  patterns  defined  as  input  vectors  and  belonging  to  one  particular  class,  as  a  cluster  of 
points  in  a  multidimensional  feature  space:  the  network  can  be  trained  to  respond  to  each  pattern 
belonging  to  a  particular  class  by  activating  only  one  and  the  same  output  node.  The  output  node 
then  is  assigned  to  this  particular  class.  Hence,  neural  networks  are  capable  to  learn  relevant 
classification  characteristics  by  example  instead  of  making  use  of  formal  descriptions. 


4.2.2  A  Short  Introduction  to  Neural  Networks 

A  neural  network  is  composed  of  many  tightly  interconnected  non-linear  processing  elements  or 
computational  units.  All  the  units  operate  in  parallel.  Each  processing  element  belongs  to  a  group 
that  is  called  a  layer  and  only  units  of  different  layers  are  interconnected.  In  general  two  types  of 
layers  are  distinguished:  layers  that  interact  with  the  environment  (input  and  output  layer)  and 


layers  that  interact  with  other  layers  (hidden  layer).  By  definition  all  units  within  one  layer  have 
the  same  functionality. 

Each  unit  in  the  network  computes  a  scalar  output  or  activity  level  as  a  function  of  the  input 
values  to  the  unit.  The  instantaneous  output  values  of  the  individual  nodes  together  define  the 
internal  state  of  the  network.  This  internal  state  can  be  regarded  as  a  short-term  working  memory. 
Long-term  storage  is  achieved  by  modifying  the  patterns  of  interconnection  strengths  among  the 
units,  i.e.  by  modifying  the  weights  associated  with  each  connection.  Changes  in  the  weight 
values  are  determined  by  learning  rules  adapting  the  unit's  response,  and  hence  the  network 
output,  to  changes  of  the  environment.  These  changes  depend  on  the  nature  of  the  input  signals 
and  the  desired  output  responses.  In  this  way  the  network  learns,  i.e.  organises  information  within 
itself.  For  an  overview  of  neural  network  learning  rules  see  [Lippmann.  *87]. 

4.2.3  Error  Back-Propagation  Neural  Networks 

The  neural  network  paradigm  that  is  considered  for  the  classification  problem  belongs  to  the  class 
of  error  back-propagation  neural  networks.  Error  back-propagation  is  a  generally  applicable  rule 
to  train  a  network  and  will  be  explained  in  Appendix  A. 

A  typical  back-propagation  neural  network  consists  of  a  three  layer  feed-forward  network 
architecture.  The  three  layers  are  generally  referred  to  as  input  layer,  hidden  layer  and  output 
layer  respectively.  The  input  layer  can  be  regarded  as  the  fan-out  of  the  input  pattern  and  hence 
exists  of  a  number  of  nodes  that  is  equal  to  the  dimension  of  the  input  vector.  The  input  layer  is 
fully  connected  to  the  hidden  layer  in  the  same  way  as  the  hidden  layer  is  fully  connected  to  the 
output  layer.  In  some  applications  the  input  layer  is  also  directly  connected  to  the  output  layer  but 
this  option  will  not  be  considered.  In  Figure  4.2  an  example  of  the  topology  of  a  three  layer 
neural  network  is  given.  The  interconnections  between  the  nodes  in  successive  layers  are  depicted 
schematically.  The  network's  topology,  the  activation  functions  of  the  nodes  and  the 
interconnection  strengths  determine  the  input-output  relation  of  the  network. 

The  actual  processing  is  done  by  the  elements  in  the  hidden  layer  and  the  output  layer.  Each 
processing  element  in  the  hidden  layer  receives  one  interconnection  from  each  element  of  the 
input  layer  and  each  processing  element  of  the  output  layer  receives  one  interconnection  from 
each  element  of  the  hidden  layer.  Associated  with  each  of  the  interconnections  is  an  adaptive 
weight  Wjj  where  j  refers  to  the  originating  node  and  i  refers  to  the  receiving  node.  In  addition, 
each  processing  element  in  the  hidden  and  output  layer  has  one  extra  constant  valued  input  and  an 
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associated  weight  factor  wiTH.  This  weight  is  referred  to  as  the  threshold  or  bias  and  provides  the 
network  nodes  with  an  extra  degree  of  freedom,  making  the  network  more  flexible  to 
approximate  a  broader  range  of  mappings.  In  general,  the  extra  input  value  is  set  to  one. 

The  output  of  a  processing  element  is  the  result  of  applying  an  activation  function  to  the  weighted 
sum  of  the  input  values  to  that  element  The  activation  frinction  that  is  most  commonly  used  is 
the  non-linear  sigmoid  frinction.  Other  non-linear  activation  functions  are  described  in  [Shynk, 
*90].  A  linear  activation  function  is  not  considered  here  since  the  transformation  abilities  of  a 
linear  three  layer  network  do  not  exceed  those  of  a  linear  two  layer  network.  The  weighted  sum 
SjO  and  the  activation  function /fj  are  given  in  Equation  4.1a  and  4.1b, 

N, 

Si  =Y,W‘j°j  +WiTh  (4-,a) 

O,  =f(Si  ) 

f(Sj  )  =  !/(!+  exp(  -s  j ))  if  sigmoidal  acti  vation  is  used  .  (4. 1  b) 


Here  /V,  is  the  number  of  input  connections  to  node  i  and  oj  the  output  of  node  j  in  the  previous 
layer.  As  mentioned  before,  the  input  layer  can  optionally  be  connected  to  the  output  layer.  The 
weighted  sum  of  Equation  4.1a  for  the  output  layer  node  then  is  extended  by  an  extra  term 
representing  the  weighted  sum  of  the  input  layer  output. 

4.2.4  Learning  Rules 

An  error  back-propagation  network  adapts  the  transformation  function  performing  the  optimum 
mapping  from  the  input  domain  to  :;ie  output  domain  by  a  process  of  learning  by  example.  This 
requires  that  the  desired  network  response  for  every  distinct  training  input  pattern  must  be 
known.  Furthermore,  requiring  the  transformation  function  to  cover  the  entire  output  range,  the 
input  set  should  be  an  adequate  representation  of  the  whole  input  domain. 

The  neural  network  response  to  an  input  pattern  is  determined  by  the  activation  function  and  the 
weight  factors  assigned  to  each  node.  By  adapting  the  weights  the  network  response  can  be 
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changed.  An  optimum  set  of  weight  factors  is  that  set  for  which  the  network  responses  to  all 
training  patterns  match  as  close  as  possible  the  desired  responses. 

Once  the  network  has  learned  all  the  transformations  of  the  training  sets,  the  network 
performance  can  be  evaluated.  New  input  vectors  generate  output  vectors  that  are  the  result  of  an 
interpolation  process  of  the  mapping  from  the  input  domain  to  the  output  range.  The  encoding  of 
complex  decision  boundaries  by  a  limited  number  of  parameters  (weights)  is  due  to  the  non¬ 
linearity  of  the  transformation  functions.  Due  to  the  non-linearity  of  the  transformation  functions, 
the  network  has  the  tendency  to  transform  the  new  inputs  in  the  same  way  as  the  trained  input  it 
is  most  similar  to.  This  is  the  reason  why  neural  networks  can  be  successfully  used  in 
classification  systems. 

4.3  Neural  Networks  and  Related  Techniques 

The  principles  of  distributed  computing,  adaptive  networks  and  connectionism  offer  a  promising 
framework  to  solve  various  difficult  classification  and  related  problems.  However,  there  are  also 
certain  drawbacks  and  uncertainties  related  to  these  techniques. 

The  information  in  a  neural  network  is  stored  in  a  distributed  way  in  the  strengths  of  the 
interconnections  between  the  nodes.  These  weights  are  adapted  during  the  learning  phase.  In  this 
phase  the  network  error  function  is  minimised  in  a  finite  number  of  steps.  This  adaptation  of  the 
weights  is  a  very  precarious  task  since  the  information  already  stored  in  the  network  may  not  be 
lost.  The  settling  of  the  network  is  therefore  very  time  consuming  (many  iterations).  Moreover,  it 
can  not  be  guaranteed  whether  the  final  stable  state  is  a  local  or  the  global  minimum  of  the 
network  error  function. 

The  so-called  learning  in  adaptive  neural  networks  is  related  to  the  fitting  of  data  with 
hyperplanes  in  a  multidimensional  space.  Following  this  relation,  the  mechanism  of  interpolation 
between  known  datapoints  (input  vectors)  that  the  networks  are  expected  to  possess,  becomes 
more  explicit.  In  (Broomhead  and  Lowe,  '88]  a  class  of  adaptive  networks  is  presented  that  can  be 
learned  simply  by  solving  a  set  of  linear  equations.  Here,  every  input  vector  is  represented  by  a 
set  of  radial  basis  functions.  Hence,  these  networks  have  a  learning  rule,  guaranteed  to  converge 
in  a  predefined  number  of  iterations. 
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Figure  4.2  Schematic  view  of  a  three  layer  neural  network  comprising  an  input  layer,  a 
hidden  layer  and  an  output  layer.  The  arrows  represent  interconnections  with 
adaptable  strengths. 


Another  class  of  networks  that  do  not  have  a  learning  phase  at  all  are  the  probabilistic  neural 
networks  [Specht,  ’901.  These  networks  incorporate  a  Bayes  classifier  where  the  decision  surfaces 
converge  to  the  Bayes-optimal  boundaries  when  the  number  of  training  vectors  increases. 

Which  of  the  three  network  types  is  optimal  for  the  problem  to  be  solved  depends  on  constraints 
such  as  the  available  training  data,  computer  power  and  learning  time.  In  the  next  section  more 
details  about  the  last  approach  will  be  presented  since  the  Bayes  classifier  based  on  a  Parzen  like 
PDF  estimation  is  used  as  a  reference  to  evaluate  the  neural  network  based  classifier. 


In  Section  2  we  have  shown  how  we  can  select  a  restricted  number  of  features  that  characterise  a, 
possibly  rotated,  object  in  an  image.  Due  to  noise  and  discretization  errors,  the  extracted  Zemike 
moment  parameters,  though  related  to  the  same  object  type,  are  never  exactly  the  same. 
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Therefore,  we  need  a  clustering  mechanism  that  can  separate  feature  vectors  related  to  one  object 
class,  from  feature  vectors  from  other  object  classes.  To  implement  this  mechanism,  we  have 
selected  the  Multi-layer  Perception  Gassifier,  or  MLP  [Rumelhart  et  ah,  '86],  A  MLP  is  a  frilly 
interconnected  feed-forward  neural  network  with  one  or  more  layers  of  nodes  between  the  input 
and  output  layer  of  nodes. 

As  input  to  the  network,  we  select  the  ordered  set  of  Zemike  moments.  When  the  maximal  order 
of  moments  used  is  12,  this  means  that  the  dimension  of  the  input  vector  equals  to  49.  for  a 
maximal  order  of  20  the  number  of  moments  equals  to  121  (see  Table  2.2  in  Section  2). 

The  number  of  output  nodes  is  equal  to  the  number  of  object  classes  to  be  separated.  To  each 
output  node  one  class  is  assigned.  The  network  is  trained  to  respond  with  all  output  nodes  to  be 
set  to  0.  except  for  the  node  that  is  marked  to  correspond  to  the  class  the  input  is  from.  That 
output  node  is  set  to  1. 

There  is  one  drawback  in  using  MLFs.  It  has  been  shown  that  a  MLP  with  at  most  two  hidden 
layers  can  form  any  arbitrarily  complex  decision  region  in  a  feature  space  [Lippmann,  ’87). 
However,  there  does  not  exists  a  specific  rule  for  selecting  the  appropriate  number  of  nodes  in  the 
hidden  layer(s).  The  optimal  network  topology  can  only  be  determined  by  trial  and  error. 


4.4  Conventional  Statistical  Classifiers 

Neural  Networks  can  only  be  considered  as  an  acceptable  alternative  to  traditional  classifiers  if 
they  perform  at  least  as  well  as  statistical  classifiers  or  even  outperform  them.  Therefore,  to  be 
able  to  draw  some  conclusions  regarding  the  performance  of  neural  networks,  we  have  to 
compare  them  with  other  techniques.  We  have  selected  two  well-known  classifiers  for 
comparison  that  will  be  evaluated  in  parallel:  A  Bayes  classifier  based  on  the  Parzen  estimator 
and  a  non-parametric  nearest  neighbour  classifier.  The  same  training  vectors  that  are  used  to  learn 
the  neural  network  are  used  either  to  determine  an  optimal  PDF  function  or  to  define  a  distance 
measure.  Both  approaches  will  briefly  be  discussed  below. 


At 
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Bayes  Classifier 

A  general  approach  towards  classification  problems  is  to  design  certain  decision  rules  or 
strategies  that  minimise  the  expected  risk  or  costs.  Such  strategies  are  called  Bayes  strategies 
[statistics],  and  are  applicable  to  classification  problems  involving  any  number  of  classes.  To  be 
able  to  implement  a  Bayes  decision  rule,  we  must  be  able  to  calculate  the  probability  density 
functions  for  the  different  classes,  i.e.  a  set  of  functions  describing  the  probability  that  a  pattern 
belongs  to  each  of  the  individual  classes.  Although  in  many  cases  the  a  priori  probabilities  are 
known  or  may  be  estimated  accurately,  in  other  situations  a  set  of  examples  or  training  patterns 
Pc  of  the  to  be  discriminated  classes  C,  are  the  only  information  that  is  available.  Parzen  [Parzen, 
*62]  presented  a  set  of  probability  density  function  estimators  that  provide  an  estimate  of  the 
underlying  density  provided  that  it  is  smooth  and  continuous.  The  estimator  we  selected  is  given 
by 

fc,  (/’)  = - v/'t  — v~ ~ X exp[~( P~Pck  >'(2°2)1  (4-2) 

( 2n)N,2aN  l  *  J 

Here  frfP)  is  the  PDF  estimator  for  class  Q  ,  P  the  measurement  vector,  feature  vector  or 
evaluation  vector  and  Pc  the  kill  training  vector  of  class  C,-.  The  estimator  uses  the  training 
patterns  Pc  as  kernels  i.e.  the  patterns  are  selected  as  centres  for  a  set  of  multivariate  Gaussian 
distributions.  As  can  be  seen  from  Equation  4.2  the  probability  density  function  is  a  sum  of 
Gaussians  scaled  by  certain  factor. 

Given  the  probability  density  functions  for  the  classes  involved,  the  Bayes  decision  rule  assigning 
pattern  P  to  class  Ct  is  given  by  d  B  (P)  =  C(-  such  that 

he,  lc{  fc,  (P)  ^  h  c  j  Icj  fcj  (**).  *  *  j-  (43) 

Her e.frfP)  represents  the  probability  density  function  of  a  class  C,,  P  is  a  N-dimensional  input 
vector,  lc  the  loss  associated  with  making  the  decision  that  P  is  a  pattern  of  class  C,  when  the 
class  is  actually  anotherone  and  lic  are  the  a  priori  probabilities  of  the  occurrence  of  a  pattern  of 
class  C,  respectively.  The  values  of  the  losses  are  based  on  the  consequences  of  making  a 
incorrect  decision.  Assigning  these  values  is  part  of  the  classification  problem  definition.  For 
simplicity,  in  our  experiments  the  values  of  the  losses  associated  with  making  a  correct  decision 
are  zero  and  the  values  of  the  losses  associated  with  making  an  incorrect  decision  are  all  equally 
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set  to  1.  The  width’s  of  the  Gaussians  are  controlled  by  the  parameter  o  .  In  our  experiments,  o 
is  set  to  1. 

Nearest  Neighbour  Classifier 

Where  the  previous  classifier  is  controlled  by  the  parameter  o  ,  the  nearest  neighbour  classifier  is 
a  non-par ametric  classifier.  To  assign  an  input  vector  P  to  a  certain  class  C,,  the  nearest 
neighbour  of  P  is  determined  among  the  set  of  all  available  training  vectors.  The  class  of  this 
vector  is  assigned  to  the  vector  P.  Hence,  the  unknown  input  vector  P  is  assigned  to  class  C(, 
following  d  (P)  =  C .  *  wliere 

/  *  =  Min  .-  d(P.P~  )  Vi.  V*  .  (4.4) 

^  i 

with  i  ranging  over  all  classes  and  k  ranging  over  all  members  per  class  and  d<)  an  Euclidean 
distance  measure. 

4.5  Classification  Results 

feature  vector  normalisation 

In  our  classification  experiments  we  are  working  with  object  features  grouped  into  feature 
vectors.  Each  feature  represents  a  Zernike  moment  of  a  predefined  order  and  repetition.  In 
Figures  5.3,  5.4  and  5.5  several  feature  values  are  enlisted.  From  these  figures  it  follows  that 
different  features  have  various  dynamic  ranges.  Therefore,  it  is  possible  that  a  small  group  of 
features  will  predominate  the  characteristic  pattern  that  is  represented  by  the  feature  vector. 

A  neural  network  that  is  trained  with  these  vectors  may  trigger  on  those  dominant  features  only. 
To  overcome  this  problem,  i.e.  to  make  sure  that  each  feature  will  be  equally  weighted,  the 
features  must  be  normalised  The  normalisation  consists  of  the  subtraction  of  the  mean  and  the 
division  by  the  standard  deviation  of  the  whole  set  of  training  samples.  As  a  result,  the  training 
feature  vectors  have  zero  mean  and  unit  variance  before  they  are  input  to  the  network.  The  mth 
feature  of  the  feature  vector  is  normalised  by 


Pm  Pi 


Pm  = 


C 


(4.5) 
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where  pm  and  o  m  are  the  sample  mean  and  standard  deviation  of  the  mth  training  features  of  all 
the  classes. 

Since  the  vectors  are  also  used  in  the  reference  classifiers,  the  vectors  are  also  scaled  to  unit 
length,  i.e.  the  sum  of  the  squared  vector  entries  is  equal  to  1. 
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5.  EXPERIMENTS  DESCRIPTION  AND  SIMULATION  RESULTS 

In  this  section  the  simulation  results  concerning  die  application  of  complex  Zemike  moments  as 
rotation  invariant  object  characterising  features  are  presented  The  emphasis  of  this  section  lies  on 
the  description  of  the  experiments  related  to  combination  of  the  Zemike  moments  feature  vectors 
and  the  neural  network  based  classifier.  In  Section  S.l  the  implementation  of  the  automatic  target 
recognition  system  as  described  in  Section  1  is  presented  This  reduced  scheme  comprises  only  the 
feature  extraction  and  classification  module.  In  Section  5.2  die  experiments  are  described  and 
motivated  and  in  Section  5.3. 5.4  and  5.5  the  results  are  interpreted. 

5. 1  Automatic  Target  Recognition  Scheme  (Implementation) 

To  be  able  to  perform  experiments  within  a  realistic  context,  we  have  implemented  a  part  of  the 
automatic  target  recognition  processing  scheme  as  depicted  in  Section  1.  Actually,  we  only 
implemented  the  feature  extraction  module  and  the  classification  module.  The  preprocessing 
module  and  the  object  detection  and  segmentation  modules  are  omitted  because  our  main  interest 
lies  in  the  evaluation  of  the  complex  Zemike  moments  as  rotation  invariant  object  characterising 
features.  We  do  realise,  however,  that  those  modules  are  of  even  greater  importance  and  even  more 
difficult  to  realise  in  an  actual  operational  system.  Consider,  for  example,  the  difficulties  in 
extracting  the  silhouette  object  from  an  original  grey-level  image  as  is  depicted  in  Image  4.1.  This 
is  a  far  from  trivial  problem,  especially  under  changing  light-  and  background-conditions. 
However,  solving  those  problems  is  an  image  processing  task  and  not  a  classification  task. 

The  feature  extraction  module  in  our  experiments  exists  of  an  image  normalisation  module  and  a 
complex  Zemike  moments  extraction  module.  The  image  normalisation  module  is  necessary  to 
position  the  object  of  interest  in  the  centre  of  the  image  plane  and  to  scale  the  object  to  an  uniform 
area.  This  preprocessing  is  necessary  because  the  numerical  values  of  the  extracted  Zemike 
features  do  depend  on  the  size  of  the  object  within  the  imaginary  unit  circle  superimposed  onto  the 
image  plane.. 

To  centre  the  object  of  interest,  first  the  centre  of  mass  of  the  object  is  determined.  The  centre  of 
mass  is  obtained  making  use  of  Equation  2.3  of  Section  2.2.  The  object  is  then  translated  and 
scaled  such  that  the  centre  of  mass  lies  in  the  centre  of  the  image  array  and  the  area  occupied  by  the 
object  has  a  predefined  value  (=  number  of  pixels).  The  translation  and  scaling  of  an  object  in  the 
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image  plane  is  depicted  in  Image  S.l.  In  this  image  also  the  imaginary  unit  circle  superimposed 
onto  the  image  array  is  shown. 

The  second  module  we  have  implemented  is  the  Zemike  rotation  invariant  features  extraction 
module.  This  module  is  entirely  based  on  the  equations  presented  in  Section  2  and  the  software 
modules  presented  in  Section  3.  The  module  makes  use  of  the  fast  implementation  based  on  the 
Zemike  moment  templates. 

We  have  already  mentioned  the  imaginary  unit  circle  superimposed  onto  the  image  plane.  This  unit 
circle  is  important  since  the  orthogonal  basis  functions  of  Equation  2.4,  on  which  the  Zemike 
moments  are  based,  are  only  defined  within  this  region.  The  unit  circle  makes  it  possible  to  extract 
Zemike  moment  coefficients  from  an  image  independent  of  the  image  size  by  applying  an 
appropriate  scaling  factor  evaluating  Equation  2.7  which  is  based  on  the  area  (in  number  of  pixels) 
of  the  unit  circle.  As  a  result  of  this  scaling,  the  Zemike  moments  extracted  from  the  images  having 
different  sizes  are  almost  the  same.  The  image  sizes  vary  from  64x64,  128x128  to  256x256  pixels. 
Small  variations  in  the  moment  coefficients  are  due  to  roundoff  errors  in  resizing  the  image  plane. 
The  resulting  Zemike  moments  of  these  images  are  depicted  in  Figure  5.1. 

For  the  classification  experiments  described  in  the  next  subsection  the  Zemike  moments  up  to  order 
20  are  determined  in  advance  for  all  images  in  the  image  database.  The  absolute  values  of  the 
Zemike  moments  are  placed  in  a  feature  vector  of  dimension  121  (order  n=[0-20],  for  all  valid 
repetitions  m).  This  vector  or  parts  of  this  vector  will  be  used  as  an  input  pattern  by  the  different 
classifiers  evaluated  in  the  experiments  as  described  in  the  next  subsection. 

5.2  Classification  Experiments  within  the  RPV  Monitor  Context 

Within  the  context  of  the  project  as  described  in  Section  1,  we  have  set  up  three  types  of 
experiments.  First,  we  want  to  find  out  what  the  performances  are  of  neural  network  based 
classifiers  in  comparison  with  traditional  statistical  classifiers.  Second,  we  are  interested  in  the 
characteristics  of  the  Zemike  complex  moments  as  rotation  invariant  features  for  automatic  target 
recognitioa  More  specifically,  we  want  to  find  out  what  the  relation  is  between  the  dimension  of 
the  input  vector,  i.e.  the  number  of  moments  used  to  characterise  an  object,  and  the  classification 
results  obtained  by  the  various  classifiers.  Considering  neural  networks,  we  are  also  interested  in 
the  relation  between  the  number  of  hidden  nodes  in  the  neural  network  and  the  classification  result 
Third,  we  want  to  find  out  what  the  generalisation  capabilities  are  of  the  various  classifiers  with 
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Figure  5.1.  Absolute  values  of  the  first  20  Zemike  moments  (=121  moments  for  all  valid 
repetitions)  extracted  from  identical  image,  only  varying  in  size  (64x64, 
128x128.256x256). 

respect  to  changes  in  the  viewing  direction  relative  to  the  objects  in  the  images.  As  we  have  seen  in 
Section  4.  the  appearances  of  objects  can  differ  dramatically  when  the  viewing  direction  is  changed. 
This  effect  must  be  taken  into  account  designing  a  reliable  classifier. 

The  neural  network  type  we  have  investigated  is  the  well-known  Multi-layer  Perception  MLP  error 
back-propagation  neural  network  type  as  described  in  Section  4.2.  For  a  reference,  we  selected  the 
nearest  neighbour  classifier  and  the  Parzen  estimator  [Parzen.  '62]  as  representatives  of  the 
traditional  statistical  classifiers. 

As  we  mentioned  in  Section  4,  the  total  number  of  different  classes  the  classifier  has  to 
discriminate,  equals  five.  Three  of  the  classes  may  be  subdivided  into  subclasses,  characterised  by 
their  relative  turret  position.  However,  in  the  experiments  described,  all  subclasses  are  mapped  onto 
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one  and  the  same  main  class.  In  Image  S.2  top  view  images  of  all  five  objects  are  depicted  As  a 
reminder,  in  Table  4. 1  in  Section  4,  an  overview  of  the  image  database  is  listed 

For  all  (subclasses,  images  taken  from  different  viewing  angles  are  available.  However,  no 
distinctions  are  made  between  different  viewing  angles  during  evaluation  of  me  results  in  die  first 
two  experiments. 

The  evaluation  of  the  classifiers'  output  is  based  on  determining  the  number  of  correct 
classifications  relative  to  the  total  number  of  evaluation  inputs.  This  means  that  no  attention  is  paid 
to  the  number  of  incorrect  classifications. 

This  may  blur  the  interpretation  of  the  classification  statistics,  since  a  high  classification  score,  in 
combination  with  a  relative  score  of  mis-classifications  may  in  some  situations  be  interpreted  worse 
than  a  lower  classification  score  in  conjunction  with  a  false  alarm  rate  of  zero. 

Nevertheless,  we  have  decided  not  to  take  into  account  the  false  alarm  rate,  since  this  biased  the 
results  in  a  way  such  that  the  characteristics  of  the  various  classifiers  may  be  obscured  (remember 
that  our  main  point  of  interest  lies  in  the  evaluation  of  the  performance  of  Complex  Zemike 
Moments  as  invariant  features,  not  to  develop  an  optimal  classifier  strategy). 

For  the  MLP  and  the  Parsen  estimator,  the  class  label  corresponding  to  the  output  node  with  die 
highest  output  value  is  assigned  to  the  input  vector.  For  the  NN  classifier,  the  class  label  of  die 
output  having  the  lowest  output  value  is  assigned  to  the  input  vector.  For  those  classes,  having 
subclasses,  we  made  use  of  24  training  vectors  per  class  to  train  the  neural  network.  For  the  2 
classes  without  subclasses,  we  made  use  of  6  training  vectors  per  class.  The  same  vectors  are  used 
as  a  reference  database  for  the  NN  and  Parzen  classifier.  The  number  of  evaluation  input  vectors 
for  the  classes  with  subclasses  was  equal  to  48  where  the  number  of  evaluation  vectors  for  the 
classes  without  subclasses  was  equal  to  12. 

5.3  Performance  comparison  between  different  classifiers 

The  results  of  the  comparison  of  the  Multi-Layer  Percept ron  neural  network,  the  nearest  neighbour 
classifier  and  the  Parzen  classifier  are  given  in  Figure  5.1a  until  5. Id.  In  each  graph  the  relation 
between  the  number  of  features  in  the  input  vector  and  the  classification  accuracy  is  depicted.  From 
the  successive  graphs  the  influence  of  the  signal  to  noise  ratio  on  the  classification  results  can  be 
deduced. 


In  Figure  S.Sa,  the  classification  results  are  depicted  based  on  input  vectors  extracted  from  images 
without  addition  of  noise.  In  Figure  5.5b.  die  contours  erf  die  objects  in  the  input  images  were  first 
corrupted  by  a  noise  process,  resulting  in  a  SNR  of  36  dB.  In  Figure  5.5c  and  5.5d  results  on 
images  with  a  SNR  of  25  dB  and  20  dB  are  presented.  In  Image  5.3,  the  effect  of  this  noise  process 
on  die  object  boundaries  is  depicted.  As  can  be  seen,  the  extraction  of  the  correct  object  contour 
becomes  more  difficult  when  the  SNR  is  decreasing.  The  effect  of  the  image  noise  on  the  extracted 
Zemike  moments  can  be  seen  inm  Figure  5.4. 

Looking  at  the  individual  graphs,  the  results  of  the  three  classifiers  do  not  differ  dramatically.  In 
general,  the  Parzen  classifier  performs  worse  for  high  signal  to  noise  ratio's  for  all  dimensions  of 
the  input  vectors.  The  performance  of  the  MLP  neural  network  and  the  NN  classifier  are  equally 
well  at  all  SNR's  for  the  highest  input  vector  dimensions.  The  performance  of  all  classifiers  are 
going  down  with  decreasing  SNR's.  It  is  known  that  the  performance  may  be  improved  when  noise 
corrupted  input  vectors  are  also  used  as  training  vectors  or  reference  vectors.  However,  for  reasons 
stated  before,  we  did  not  take  these  into  account 

A  second  conclusion  we  may  draw  from  these  graphs  is  that  the  total  number  of  features  taken  into 
account  in  the  classification  process,  has  a  positive  effect  on  the  final  classification  results.  This 
can  be  seen  from  the  downwards  slope  of  the  lines  in  the  graphs:  the  higher  the  number  of  features 
in  the  input  vector,  the  higher  the  classification  accuracy. 
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(a) 

Figure  5.5  Classification  results  related  to  the  number  of  featues  in  the  input  vector  using 
1  Zemike  Complex  moments  as  invariant  features.  Here.  MLP,  NN  and  PAR  stand 

for  Multi-Layer  Perception  neural  network.  Nearest  Neighbour  classifier  and 
Par/an  classifier  respectively.  The  neural  network  has  50  nodes  in  the  hidden  layer. 
There  are  5  classes  and  84  input  vectors  for  training  and  168  input  vectors  for 
evaluation,  (a)  Noiseless;  (b)  SNR;  36  dB;  (c)  SNR  25  dB;  (d)  SNR  20dB; 
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Figure  5.6  Gassification  results  of  Multi  Layer  Perception  neural  networks  related  to  the 
number  of  hidden  layer  nodes,  using  Zemike  Complex  moments  as  invariant 
features.  The  neural  networks  vary  in  the  number  of  features  stared  in  the  input 
vectors,  ranging  from  121.  49.  36,  to  20.  respectively.  There  are  5  classes  and  84 
input  vectors  for  training  and  168  input  vectors  for  evaluation,  (a)  Noiseless;  (b) 
SNR:  36  dB;  (c)  SNR  25  dB;  (d)  SNR  20dB; 
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Signal  to  Noisa  ratio 


Figure  5.7  Classification  results  of  Multi  Layer  Perceptron  neural  networks  related  to  the 
signal  to  noise  ratio  of  the  images  the  Zemike  Complex  moments,  used  as  invariant 
features,  are  extracted  from.  The  number  of  nodes  in  the  hidden  layer  equals  to  50. 
The  neural  networks  vary  in  the  number  of  features  stored  in  the  input  vectors, 
ranging  from  121.  49,  36,  to  20,  respectively.  There  are  5  classes  and  84  input 
vectors  for  training  and  168  input  vectors  for  evaluation. 
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(a) 

Figure  5.8  Classification  results  related  to  the  viewing  direction  of  the  training  data  set  and 
the  evaluation  data  set.  Here.  MLP,  NN  and  PAR  stand  for  Multi-Layer 
Perceptron  neural  network.  Nearest  Neighbour  classifier  and  Parzan  classifier 
respectively.  The  neural  network  has  40  nodes  in  the  hidden  layer.  There  are  5 
classes  and  56  input  vectors  for  training  and  42  input  vectors  for  evaluation,  (a) 
Viewing  direction  0  degrees;  (b)  Viewing  direction  30  degrees;  Viewing  direction 
60  degrees;  (d)  Viewing  direction:  all; 
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5.4  Zemike  Moments  and  Neural  networks 

Considering  multi-layer  perceptron  neural  networks,  there  are  several  network  parameters  that  can 
be  optimised,  tuning  the  classifier.  Focusing  on  the  network  topology,  we  may  alter  the  number  of 
hidden  layers  and  the  number  of  nodes  in  each  layer.  Since  any  mapping  from  the  input  space  to  the 
output  space  can  be  realised  by  a  network  having  only  one  hidden  layer,  we  did  decide  not  to 
evaluate  networks  with  more  than  one  hidden  layer.  However,  we  did  vary  the  number  of  nodes  in 
this  layer,  while  at  the  same  time  varying  the  dimension  of  the  input  vector,  i.e.  the  number  of 
Zemike  features  characterising  the  object 

In  Figure  5.6  the  results  of  the  experiments  related  to  the  tuning  of  the  network  topology  are 
depicted.  The  data  sets  are  the  same  as  described  at  the  beginning  of  this  section.  In  Figure  5.6a, 
the  influence  of  the  number  of  hidden  nodes  on  the  classification  accuracy  is  depicted  for  4 
different  input  vector  dimensions.  It  is  striking  that  the  dimension  of  the  hidden  layer  hardly  has  any 
effect  on  the  classification  accuracy.  However,  the  conclusions  drawn  from  Figure  5.5  are  again 
valid.  The  larger  the  dimension  of  the  input  vector,  the  higher  the  classification  accuracy. 

In  Figure  5.6b-5.6d.  the  influence  of  the  number  of  hidden  layer  nodes  on  the  classification 
accuracy  is  depicted  for  various  signal  to  noise  ratio’s  of  the  input  image.  Again,  the  relatively 
small  influence  of  this  parameter  on  the  classification  accuracy  is  evident.  On  the  other  hand,  the 
positive  influence  of  the  input  vector  dimension  on  the  overall  result  is  again  clearly  seen,  especially 
in  Figure  5.6c.  Up  to  a  signal  to  noise  ratio  of  25  dB,  the  classification  accuracy,  for  a  input  vector 
dimension  of  121,  lies  at  an  acceptable  level  of  above  85%.  For  lower  input  dimensions,  the 
accuracy  is  already  unacceptable  low.  At  a  SNR  of  36  dB,  the  results  for  an  input  vector  dimension 
of  20  features  is  below  80%. 

The  positive  effect  of  the  input  vector  dimension  is  summarised  in  Figure  5.7.  Since  the  number  of 
hidden  layer  nodes  hardly  has  any  influence,  this  graph  is  only  given  for  a  number  of  hidden  layer 
nodes  equal  to  50.  Again,  the  robustness  of  the  classifier  for  variations  in  the  signal  to  noise  ratio 
for  a  large  number  of  input  features  can  clearly  be  seen. 

5.5  Generalisation 

The  third  series  of  experiments  introduced  in  Section  5.1,  are  focused  on  gaining  insight  in  the 
generalising  capabilities  of  the  various  types  of  classifiers.  Here,  with  generalisation  we  mean  the 
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capability  of  a  classifier  to  label  a  feature  vector  signature  with  the  correct  class  label  also  in  those 
cases  where  the  signature  is  not  lying  within  the  range  covered  by  the  training  vector  set 
We  have  simulated  this  behaviour  by  training  the  neural  network  with  vectors  originating  from 
objects  monitored  with  viewing  directions  of  0  degrees  and  evaluating  the  neural  network  classifier 
with  vectors  monitored  with  viewing  directions  of  30  degrees  and  60  degrees.  As  can  be  seen  from 
Image  5.5  the  projection  of  an  object  onto  the  image  plane  do  changes  dramatically  with  changing 
the  viewing  direction.  Therefore,  it  may  happen  that  an  object  viewed  from  one  direction  is  more 
similar  to  another  object  than  to  the  same  object  viewed  from  a  different  direction. 

In  Figure  5.8  the  results  of  the  generalisation  experiments  are  depicted.  As  a  reference,  in  Figure 
5.8a  the  results  for  the  various  classifier  types  are  presented.  The  classifiers  are  trained  and 
evaluated  with  feature  vectors  originating  from  objects  monitored  from  the  same  viewing  direction 
of  0  degrees. 

In  Figure  5.8b  the  classification  results  are  depicted  based  on  a  training  set  of  0  degrees  and  a 
evaluation  set  of  30  degrees.  In  Figure  5.8c  the  same  results  are  given  for  a  training  set  of  0  degrees 
and  a  evaluation  set  of  60  degrees.  Finally  in  Figure  5.8d  an  overview  of  the  neural  network 
performance  for  the  different  viewing  directions  is  givea 

The  generalisation  capabilities  are  quite  remarkable  for  the  30  degree  case,  especially  for  larger 
input  dimensions.  However,  for  the  60  degree  case,  the  results  are  almost  all  below  the  80%.  This 
can  be  expected  giving  the  variations  in  the  projected  images  originating  from  different  viewing 
directions  as  is  depicted  in  Image  5.5.  When  the  number  of  classes  is  extended,  the  results  will 
probably  be  even  worse.  Nevertheless,  all  classifier  types  do  show  some  degree  of  generalisation 
which  again  demonstrates  the  robustness  of  the  Zemike  moments  for  capturing  object 
characteristics. 
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6.  CONCLUSIONS.  SUGGESTIONS  FOR  FURTHER  RESEARCH  AND 

CONCLUDING  REMARKS 

6.1  Conclusions 

From  the  experiments  described  and  evaluated  in  Section  5,  several  conclusions  can  be  drawn 
related  to  the  behaviour  of  Zemike  moments  in  combination  with  neural  networks  in  a  classification 
scheme.  However,  given  the  limited  scope  of  the  project  and  the  experiments  set-up  these 
conclusions  are  not  in  general  valid  and  can  not  be  extended  to  an  actual  operational  classification 
system.  The  Zemike  moments  provide  rally  a  solution  to  the  feature  extraction  and  classification 
stages  of  a  object  recognition  scheme.  The  object  detection  and  segmentation  modules  are  not 
covered  by  this  research.  Nevertheless,  the  experiments  did  give  us  clear  insight  in  the  behaviour 
and  performances  of  the  complex  Zemike  moments  as  rotation  invariant  object  characterising 
features. 

effectiveness  Zemike  moments 

The  complf  ■  Zemike  moments  have  shown  to  be  a  very  effective  way  to  characterise  objects  for  a 
scale,  translation  and  rotation  invariant  automatic  target  recognition  system.  A  classification  score 
of  up  to  90  percent  has  been  accomplished  given  a  database  with  five  different  military  vehicles 
monitored  under  different  levels  of  scaling,  translation  and  rotation  in  the  image  plane.  Moreover, 
the  coefficients  are  robust  under  scaling  of  (he  image  plane  dimension.  However,  the  classification 
results  are  decreased  considerably  when  identical  objects  are  to  be  classified  viewed  from  different 
viewing  angles.  This  problem  may  be  overcome  when  for  each  object  training  examples  are 
available  for  all  appropriate  viewing  directions.  In  short,  the  moments  are  robust  under  scaling, 
translation,  rotation  and  noise  and  to  a  certain  extend  to  varying  affine  projections. 

complexity  Zemike  moments 

The  Zemike  moments  have  shown  to  be  a  computationally  complex  approach  towards  the  problem 
of  rotation  invariant  object  recognition.  The  computational  overhead  can  be  reduced  by  calculating 
in  advance  the  Zemike  polynomial  coefficients.  In  this  way  run-time  computation  time  can  be 
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substituted  by  a  large  memory  comprising  the  Zernike  templates:  computer  power  is  substituted  by 
computer  storage. 

On  the  other  hand,  the  Zernike  moments  are  efficient  in  a  way  that  the  contour  characteristics  are 
captured  in  a  small  number  of  coefficients  relative  to  the  contour  complexity.  The  relative 
insignificant  influence  of  the  number  of  coefficients  on  the  classification  results  suggests  that  the 
dominant  contour  characteristics  are  captured  in  the  first  10  to  20  coefficients. 

Multi-layer  perception  neural  network 

The  multi-layer  perception  neural  network  proved  to  be  a  robust  classification  system  even  under 
severe  signal  to  noise  conditions.  However,  the  MLP  neural  network  did  not  outperform  in  any  way 
the  straightforward  classical  nearest  neighbour  classifier.  Though,  the  advantage  of  a  neural 
network  in  an  operational  system  can  be  that  the  classification  time  can  be  reduced.  The  extended 
example  database  that  is  used  by  the  nearest  neighbour  classifier  and  the  Parzen  classifier  is  in  the 
neural  network  case  encoded  in  the  weight  coefficients  of  the  network  which  is  in  general  very 
efficient.  The  time  to  evaluate  a  given  input  feature  vector  can  therefore  be  reduced  considerably. 

6.2  Suggestions  for  further  Research  and  Concluding  Remarks 

Since  the  Zernike  moments  have  shown  to  be  an  effective  solution  towards  the  problem  of  rotation 
invariant  object  recognition,  further  research  in  the  field  of  moments  in  relation  with  an  automatic 
target  recognition  system  is  appropriate. 

In  our  experiments  we  only  considered  the  last  two  stages  of  the  object  recognition  scheme  depicted 
in  Section  1:  feature  extraction  and  classification  modules.  Further  work  may  include  the 
development  of  robust  modules  for  the  first  two  sections,  i.e.  the  object  detection  and  object 
segmentation  modules.  Especially,  the  object  segmentation  will  be  a  serious  problem  in  its  own. 

Furthermore,  the  extraction  of  the  Zernike  moments  is  a  time  consuming  operation.  Research  in 
finding  efficient  real-time  implementations  is  required.  A  suggestion  to  this  problem  is  given  in 
Section  3.  Moreover,  a  study  comparing  the  characteristics  of  the  complex  Zernike  moments  and 
other  moments  mentioned  in  Section  2  will  give  insight  in  the  various  alternatives  and  make  a 
qualitative  selection  of  one  of  the  moments  types  to  be  implemented  in  a  demonstration 
classification  system  possible. 
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Finally,  as  a  second  alternative  to  the  automatic  target  recognition  problem,  model  based  object 
recognition  may  be  considered.  Interesting  results  have  been  reported  in  literature.  3  Dimensional 
models  or  a  priori  knowledge  about  the  objects  of  interests  may  reduce  the  sensitivity  of  the 
classification  system  for  errors  in  the  segmentation  process,  still  one  of  the  most  difficult  problems 
to  be  solved  in  image  processing. 
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Image  3.1 


Transformation  of  coordinate  system  from  Cartesian  to  polar:  (a)  Cartesian 
coordinates;  (b)  polar  coordinates; 
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(*>) 


Zemike  complex  moments  templates:  (a)  temlate  Z80  (order  8.  repetition  0);  (b) 
template  Z8 1  (order  8.  repetition  1); 
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Image  3.3  Zemike  complex  moments  templates:  (a)  template  Z83  (order  8,  repetition  3);  (b) 
template  Z86  (order  8.  repetition  6); 


i 


O  n 


«  (b) 


Image  4.1  Database  example.  The  significance  of  the  test  image  with  respect  to  the  real-life 
object  can  clearly  be  seen;  (a)  database  example;  (b)  real-life  image; 

Image  (b)  reprinted  with  permission  of  DMKL/DCAWACO. 
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Image  4.2  Database  example  of  the  relative  translation,  scaling  and  rotation  of  the  projection 
of  an  object  onto  the  image  plane. 


4 


4 


Database  example  of  images  of  one  and  the  same  object  with  different  turret 
positions  resulting  in  quite  different  projections  onto  the  image  plane. 
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Image  5.1  TranslaUon  and  scaling  of  the  object  in  the  image  plane:  (a)  original  image;  (b) 
centred  and  scaled  object  (unit  cirlce  superimposed); 


1  > 
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Image  5.5 


Database  example  of  images  of  one  and  the  same  object  viewed  from  different 
viewing  directions  a:  (a)  topview,  0=90^;  (b)  a=6(A  (c)  a=3(P; 
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APPENDIX  A  TRAINING  NEURAL  NETWORKS 


Error  function 

Training  a  neural  network  with  a  set  of  training  vectors  comprising  examples  of  all  classes  under 
consideration  is  equivalent  to  determining  an  optimal  set  of  interconnection  weights.  This 
optimal  set  is  found  making  use  of  an  iterative  process.  During  each  iteration  step  the  weights  are 
changed  or  adapted  so  as  to  implement  a  gradient  descent  on  an  error  function  E().  This  error 
function  is  the  mean  squared  error  obtained  by  the  network  over  the  entire  training  set  of  input 
patterns  as  given  in  Equation  A.  1 


(A.l) 


Here  Np  is  the  number  of  patterns  in  the  training  set.  tpj  the  target  output  value  for  the  jr/i 
component  of  the  output  pattern  for  pattern  p  and  opj  the  j//i  element  of  the  actual  network  output 
as  a  result  of  the  presentation  of  input  pattern  p.  Since  in  practice  the  weights  are  adapted  after 
each  pattern  is  presented,  the  learning  algorithm  departs  in  some  extent  lrom  a  true  gradient 
decent  in  the  error  function.  But  with  a  small  enough  learning  rate  the  function  still  will  be 
minimized. 


Adaptation  of  the  Weight  Factors 

From  the  network  response  to  a  training  input  pattern  and  the  desired  network  output  an  error 
signal  can  be  determined  for  each  output  node.  This  error  signal  is  used  to  adjust  the  weights  of 
the  incoming  connections  to  the  nodes. 

The  derivation  of  the  adaptation  or  learning  rules  are  omitted.  A  detailed  description  is  given  in 
[Rumelhart  et  al.,  ’86].  We  will  suffice  by  giving  the  overall  results.  The  basic  equations 
representing  the  error  signal  8oi  and  learning  rule  for  a  node  on  the  output  layer  is  given  in 
Equation  A.2 


««•  =/(*,  )('/  -*«■) 


(A.2a) 
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Aw  ij  (n  + 1)  =  a  0  8  oi  o  y  +  p  0  Aw  y  (n)  ( A.2b) 

w  jj  (n  + 1)  =  w  jj  (it) + Aw  y  (n  + 1)  ( A.2c) 

where  Aw-  represents  the  change  of  the  weight  factor  wy.  ooi  the  output  value  of  processing 
element  i  in  the  output  layer,  t,  the  desired  output  (target),  Oq  the  learning  rate  of  the  output  layer 
and  f(Sj)  the  derivative  of  the  activation  function  i.e. 

f  (s  j )  =  o  j  (1  -o  j )  if  sigmoidal  activation  is  used . 

As  can  be  seen  from  Equation  A.2  the  change  of  weight  Aw^  depends  on  the  error  signal  801  of 
output  node  i.  the  output  value  Oj,;  of  hidden  node  j  and  a  constant  learning  rate  aG.  A  momentum 
term  Bc  is  introduced  to  achieve  a  certain  conservatism  in  the  direction  the  weight  factors  aie 
adapted.  In  general  the  relations  >  0  and  0  <  Bc  <  1  are  valid  for  the  learning  rate  and 
momentum  term.  Optimum  values  for  both  and  B„  are  problem  specific  and  can  only  be  found 
by  trial  and  error.  Remark:  The  same  learning  rule  is  applied  to  wiTH. 

It  is  important  to  realise  that  only  for  the  nodes  in  the  output  layer  a  straight  forward  error  signal 
can  be  obtained.  An  error  signal  for  the  nodes  in  the  hidden  layer,  necessary  to  update  the  weights 
between  the  input  layer  and  the  hidden  layer  is  therefore  in  one  sense  artificial.  However  it  can  be 
shown  that  the  error  signal  definition  Sy  and  learning  rule  for  the  hidden  nodes  as  given  in 
Equation  A.  3  indeed  decrease  the  overall  network  error. 

No 

*h,  =[oa,  (l-o/,.  )]X8ot  w*.  (A.3a) 

*=1 

Awy  (n+l)  =  ah8 h.  Oj  +P/,Aw,y  (n)  (A.3b) 

Wy  («  +  !)  =  w ^  (n)  +  Aw ^  ( n  + 1)  (A.3c) 

Here  8hl  represents  the  error  signal  of  hidden  node  i,  o^  the  output  value  of  hidden  node  i,  5^  the 
error  signal  of  output  node  k,  N0  the  number  of  nodes  in  the  output  layer  and  ww  the  weight 
between  hidden  node  i  and  output  node  k. 
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APPENDIX  B  ACTIVE  VISION  USER  COMMANDS 


■  name  normjmage  -  Active  Vision  image  object  size  converter 


■  synopsis  normjmage  input vff  outpuLvfT  area  [- v] 

■  description  normjmage  normalizes  a  bi-valued  (logical)  image  with  respect  to  the  number 

of  non-zero  pixels  in  the  image.  The  input  image  must  be  eighter  an  avBYTE 
image  with  logical  values  0  and  255  or  an  avFLOAT  image  with  logical  values 
0.0  and  1.0. 

The  desired  number  of  non-zero  pixels  in  the  output  image  is  given  by  <area>. 

The  command  normjmage  does  its  work  silently.  With  the  -v  (Verbose)  option 
given,  the  centre  of  gravity  of  the  input  image,  the  scale  factor,  the  area  of  the 
input  image,  the  desired  area  and  the  area  of  the  output  image  are  given. 

Due  to  interpolation  and  thresholding,  the  desired  area  and  the  area  of  the 
output  image  may  slightly  differ. 


■  options  -v  Verbose  option 

■  example  Convert  a  binary  object  in  an  image  comprising  an  arbitrary  number  of  pixels 

into  an  object  of  1 2000  pixels: 

example%  normjmage  object vff  normed_objecLvff  12000  -v 


■  see  also 


■  remarks 


based  on  Active  Vision  0.1  function  calls 
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■  name 


■  synopsis 

■  description 


■  options 

■  example 


■  see  also 


■  remarks 


zemikejemplate  -  Active  Vision  Zemike  template  images  generator 


zemikejemplate  input. vff  dst  order_min  order jnax  [-v] 


zemikejemplate  generates  template  images  in  the  .VFF  file  format  of  Zemike 
polynomials  of  order  <order_min>  up  to  <order_max>.  The  range  of  the  valid 
moment  order  parameters  lies  between  0  and  20.  For  each  <order><repetition> 
combination  a  separate  template  is  generated  and  stored  in  the  file 
<dst<orderxrepetition>.vff>. 

The  template  exists  of  three  bands.  One  band  to  calculate  the  real  part  of  the 
Zemike  moment,  one  band  to  calculate  the  imaginary  part  of  the  Zemike 
moment  and  one  band  to  calculate  the  absolute  value  of  the  Zemike  moment. 

The  image  <src>  is  only  used  to  determine  the  size  of  the  template  images. 

The  command  zemikejemplate  does  its  work  silently.  With  the  -v  (Verbose) 
option  given,  the  generated  template  filenames  are  echoed  on  serene,  during 
execution. 


-v  Verbose  option 


Obtain  the  zemike  moment  template  images  from  order  0  up  to  order  20  of  the 
same  size  as  the  reference  image  object.vff  and  store  the  results  in  the  file(s) 
moment<mxn>.vff: 

example%  zernike.template  object. vff  moment  0  20  -v 


based  on  Active  Vision  0.1  function  calls 
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■  name 


■  synopsis 

■  description 


■  options 


■  example 


■  see  also 


■  remarks 
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zeroike_moment  -  Active  Vision  Zemike  complex  moments  generator 


zernike_moment  input vfT  dst  order_min  order_max  [-v] 


zemike_moment  generates  the  Zemike  moments  of  the  input  image  input.vff 
and  stores  the  results  in  the  output  file  <dst>.  For  each  moment,  the  real, 
imaginary  and  absolute  moment  value  are  successively  stored. 

The  moments  are  calculated  from  order  <order_min>  up  to  order  <order_max> 
for  all  valid  repetition  values.  The  range  of  the  valid  moment  order  parameters 
lies  between  0  and  20. 

The  command  zemike_moment  does  its  work  silently.  With  the  -v  (Verbose) 
option  given,  the  calculated  moment  values  are  echoed  on  serene,  during 
execution. 


-v  Verbose  option 

Obtain  complex  Zemike  moments  from  order  0  up  to  order  12  of  the  input 
image  object. vff  and  store  the  results  in  object.zm: 

example%  zernike_moment  objecLvfT ohjecLzm  0  12  -v 

zemike_fmoment 


based  on  Active  Vision  0. 1  function  calls 
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■  name 


■  synopsis 

■  description 


■  options 


■  example 


■  see  also 


■  remarks 
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zemikejrnoment  -  Active  Vision  fast  Zemike  complex  moments  generator 


zemike Jmoment  InpuLvff  dst  template  order.min  order  max  [-v] 


zemikejrnoment  generates  the  Zemike  moments  of  the  input  image  input.vff 
and  stores  the  results  in  the  output  file  <dst>.  For  each  moment,  the  real, 
imaginary  and  absolute  moment  value  are  successively  stored.  The  calculation 
is  based  on  the  predefined  Zemike  moment  templates 

«templatexorderxrepetition>.vff> 

for  fast  calculation. 

The  moments  are  calculated  from  order  <order_min>  up  to  order  <order_max> 
for  all  valid  repetition  values.  The  range  of  the  valid  moment  order  parameters 
lies  between  0  and  20. 

The  command  zemikejrnoment  does  its  work  silently.  With  the  -v  (Verbose) 
option  given,  the  calculated  moment  values  are  echoed  on  serene,  during 
execution. 


-v  Verbose  option 


Obtain  complex  Zemike  moments  from  order  0  up  to  order  12  of  the  input 
image  object. vff  making  use  of  the  predefined  Zemike  templates  <moment>  and 
store  the  results  in  object.zm: 

example%  zemikejrnoment  objecLvflf  objecLzm  moment  0 12  -v 
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zemike_reconstruct 

moments 


Active 


Vision  image  reconstructor 


based  on  Zemike 


zernike_reconstruct  Input,  vff  output. vff  momentftle  order  min  order  max 
l-v] 


zemi  ke_reconstruct  generates  an  image  based  on  the  complex  Zemike  moments 
stored  in  file  «momentfiIexorderxrepetition>.vfl>  from  order  <order_min> 
up  to  order  <order_max>.  The  range  of  the  valid  moment  order  parameters  lies 
between  0  and  20. 

The  output  image  <output.vff>  is  the  sum  of  the  input  image  <input.vff>  and 
the  generated  image. 

The  command  zemike_reconstruct  does  its  work  silently.  With  the  -v  (Verbose) 
option  given,  the  order  and  repetition  of  the  moment  filenames  are  echoed  on 
serene,  during  execution. 


-v  Verbose  option 


Reconstruct  an  image  out  of  the  complex  Zemike  moments  stored  in  the  file 
object.zm  from  order  6  up  to  order  12  and  add  the  result  to  the  partially 
reconstructed  image  rei.00_05.vff  and  store  the  result  in  rec00_l  2. vff 

examplfe%  zernike_reconstruct  rec00_05.vff  rec00_12.vff  objeetzm  6 12  -v 


zemike_moment 
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zemike.freconstmct  -  Active  Vision  fast  image  reconstructor  based  on  Zemike 
moments 


zernike.freconstruct  inputvff  output,  vff  momentfile  templatefUe 

order_min  order_max  [-v] 


zemike_freconstruct  generates  an  image  based  on  the  complex  Zemike 
moments  stored  in  file  «momentfilexorderxrcpetition>.vf(>  from  order 
<order_min>  up  to  order  <order_max>.  The  range  of  the  valid  moment  order 
parameters  lies  between  0  and  20. 

The  output  image  <output.vff>  is  the  sum  of  the  input  image  <input.vffi>  and 
the  generated  image. 

The  calculation  is  based  on  the  predefined  Zemike  moment  templates 
«templatefilexorderxrepetition>.vff> 
for  fast  calculation. 

The  command  zemi ke_freconstruct  does  its  work  silently.  With  the  -v 
(Verbose)  option  given,  the  order  and  repetition  of  the  moment  filenames  are 
echoed  on  serene,  during  execution. 


-v  Verbose  option 


Reconstruct  an  image  out  of  the  complex  Zemike  moments  stored  in  the  file 
object.zm  from  order  6  up  to  order  12  and  add  the  result  to  the  partially 
reconstructed  image  rec00_05.vff  and  store  the  result  in  rec00_12.vff: 

example  %  zernike_f reconstruct  rec00_05.vff  rec00_12.vff  objecLzm 

moment  6  12  -v 


zemikejmoment 
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nl Ledge  -  Active  Vision  edge  detector  based  on  Nonlinear  Laplace  operator 


nll.edge  input,  vff  output,  vff  -g  [01315]  -I  [315]  [-v] 


nll_edge  generates  an  edge  image  of  the  input  image  <input.vff>  and  stores  the 
result  in  <output.vff>.  The  input  image  can  be  preprocessed  by  a  Gauss  kernel 
by  selecting  <-g  3>  (sigma  =  sqr»(0.5) )  or  <-g  5>  (sigma  =  sqrt(2.0) ).  For  <-g 
0>  no  Gauss  kernel  is  applied.  The  Nonlinear  Laplace  kernel  size  can  be 
choosen  between  <-l  3>  and  <-l  5>. 

The  command  nll_edge  does  its  work  silently.  With  the  -v  (Verbose)  option 
given,  the  different  processing  steps  are  echoed  on  serene,  during  execution. 

By  thresholding  the  output  image  a  bi-valued  edge  image  can  be  obtained. 


-v  Verbose  option 


Obtain  avFLOAT  edge  image  <output.vff>  of  input  image  <input.vff>, 
preprocessing  the  input  image  with  a  Gauss  kernel  of  size  5.  by  applying  a 
Nonlinear  Laplace  operator  of  size  5: 

example  %  nll_edge  input,  vff  output,  vff  -g  5  -1 5  -v 


Vliet,  L.J.  v„  and  Young,  I.T.,  "A  Nonlinear  Laplace  Operator  as  Edge 
Detector  in  Noisy  Images",  Computer  Vision,  Graphics,  and  Image  Processing. 
45,  pp.  167-195,  1989. 
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